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(54) Title: A METHOD AND SYSTEM FOR MANAGING CONFIDENTIAL INFORMATION 

^ (57) Abstract: A method and a system for information management and control is presented, based on modular and abstract de- 
^5 scription of the information. Identifiers are used to identify features of interest in the information and information use policies are 
j5 assigned direcdy or indirectly on the basis of the identifiers, allowing for flexible and efficient poli cy management and enforcement, 

in that a policy can be defined with a direct relationship to the actual information content of digital data items. The information con- 
Q tent can be of various kinds: e.g., textual documents, numerical spreadsheets, audio and video files, pictures and images, drawings 

etc. The system can provide protection against information policy breaches such as information misuse, unauthorized distribution 

and leakage, and for information tracking. 
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A method and system for managing confidential information. 



5 FIELD OF THE INVENTION 

The present invention relates generally to the field of managing and 
securing digital information. More specifically, the present invention deals with 
methods for classification, management and tracking digital information over 
the course of its lifecycle. 

10 

BACKGROUND OF THE INVENTION 
The information and knowledge assets created and accumulated by 
organizations and businesses are of extreme value in the modem economical 
environment. As such, managing and keeping the information and the 

15 knowledge inside the organization and restricting its distribution outside, is of 
paramount importance for almost any organization, government entity or 
business and provides a significant leverage over its value. Most of the 
information in modem organizations and businesses is represented in a digital 
format that can be easily distributed via digital communication networks. 

20 However, ease of the promptness, comfort and information availability offered 
by these digital networks is accompanied by a constant hazard of information 
leak due to innocent mistakes, carelessness and malicious attempts to deliver 
non-public or otherwise confidential information to unauthorized entities. 
Information losses can cause anything from minor embarrassment to severe 

25 financial damage by enabling fraud and by causing loss of business secrets and 
consequent competitive advantage. In addition, such loss may expose the 
organization to legal sanctions and liabilities (e.g., under the US Gramm- 
Leach-Bliley act, the US Sarbanes-Oxley act, the US HIPAA privacy and 
security regulations, and directive 95/46/EC of the European Parliament). In 

30 order to exploit the value of information and commercial knowledge to as large 
an extent as possible, whilst mitigating risks that stem from unauthorized 
dissemination of information, the information distribution needs to be carefully 
and skillfully managed. 

Managing information distribution includes several aspects, such as: 
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• Making the information explicitly available to authorized persons so 
that they can utilize the information in order to create value for the 
organization. 

• Assuring that the information remains intact - i.e., that the integrity 
5 of the information is conserved. 

• Restricting the information distribution to authorized persons only - 
i.e., maintaining the confidentiality of the information. 

• Tracking the information along its lifecycle, in order to obtain a clear 
understanding of the information flow and to allow for adequate 

1 0 information retention practice. 

Information assets in organizations and businesses evolve dynamically 
following their creation. During the evolution process, additional information 
and knowledge are created; added; destroyed; formats change; names change, 
etc. The process may have one, several or numerous contributors. Managing the 

15 information distribution along its lifecycle is therefore an involved task. In 
some cases, the information is relevant only within a limited time-window, and 
the value of the information sharply decreases after some time. E.g., the 
information that is relevant for predicting the price of a certain commodity at a 
certain time, becomes steadily less valuable as the time gets closer. In other 

20 cases, the information represents accumulated knowledge. In this case, the merit 
of the information may even increases with time. This state of affairs further 
complicates the information-management task. 

Methods that attempt to track digital information and manage 
information distribution exist. Some of these methods utilize file meta-data, 
. 25 which may not be robust against changes in the file format. Other methods 
utilize keywords-based classification, which tends to be either over-exclusive or 
over-inclusive. Other methods restrict information usage and distribution to 
particular kinds of applications, commonly referred to as Digital Right 
Management (DRM) applications. DRM applications have the disadvantage 

30 that they hamper normal workflow and require large to massive investment 
levels. Still other methods consider the binary signature of the file, but this has 
the disadvantage of. depending critically on the precise representation of the 
data. 

The above methods thus do not provide an adequate solution to the 
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problem of modern businesses for the reasons outlined above. The large number 
of formats in which the same information can be represented, the large number 
of applications that can use the same information in different ways; the large 
numbers of kinds of storage that the information can be kept in, and the large 
5 number of information distribution channels types, tend to render any given 
method ineffective over a business environment taken as a whole. File metadata 
is often altered when the format of the file or the storage medium of the data is 
changed. Binary digital signature is of zero-tolerance to any changes in the 
signed data, and keyword or key-phrase based tracking cover only a very 
10 limited aspect of the problem. 

Methods for screening and filtering of digital content also exist and are 
widely used, in order, for example, to allow censorship of offending material 
(e.g., pornography). These methods lack the resolution needed for effective 
policy definition and enforcement, and tend to be over exclusive or over 
15 inclusive, 

Methods that utilize sophisticated searching algorithms in databases and 
over the Internet also exist. These methods are optimized for information 
retrieval and for providing answers to specific queries, and, in general, cannot 
provide either for effective tracking of specific information items or for 
20 effective policy enforcement. 

Another issue that further complicates the monitoring process is the so- 
called template document. In many cases, documents are derived from template 
type source documents, for example standard contracts. In these cases, the 
ability to monitor and track various different documents that are derived from 
25 the same or a similar source template cannot be based on any naive notion of 
resemblance between fee documents, since two documents that are derived from 
. the same / similar template may be, on one hand, very similar, while, on the 
other hand, the differentiating details, such as the names of the sides of the 
contract, may be of considerable importance. Tracking different derivatives of a 
30 template document is not adequately addressed by current methods. 

There is thus a recognized need for, and it would be highly 
advantageous to have, a method and system that allow information tracking and 
information distribution management along the information life .cycle* which 
. overcomes the drawbacks of current methods as described above. 
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SUMMARY OF THE INVENTION 

According to a first aspect of the present invention, a method and a 
system for information management and control is presented, based on modular 
and abstract description of the information. The system allows for a flexible and 
5 efficient policy management and enforcement, where a policy can be defined 
with direct respect to the actual information content of the digital data items. 
The information content can be of various kinds: e.g., textual documents, 
numerical spreadsheets, audio and video files, pictures and images, drawings 
etc. The system can provide protection against information policy breaches 
10 such as information misuse, unauthorized distribution and leakage, and for 
information tracking. 

As a first step in practicing the invention, elementary information 
units are defined. These elementary information units may be sentences, 
sequences of words, sequences of characters, numbers, graphs, vectors, 
15 matrices, pieces of raw data, images, etc. The system then assigns 
representation-independent identifiers and indices for each elementary 
information unit. An information object that consists of one or more 
information units is thereafter defined by the system. For example, a textual 
document can be considered as an information object, and various sequences of 
20 words are considered as the basic information units. Within the context of this 
invention, an information object is the basic ingredient on which a policy is 
defined. A simple information object is an information object that can be 
described as a concatenation of one or more basic information units, while a 
compound information object is an information object that consists of two or 
25 more simple information objects, possibly together with the information needed 
for their combination (e.g., a textual document with an embedded numeric 
worksheet). An information class is the set of all information objects on which 
precisely the same policy is defined. An instance of an information object is a 
specific representation of the information object, e.g., an instance of the 
30 information object describes a certain text may be a file that contains the said 
text in a MS-Word format. 

In a preferred embodiment of the present invention, .the information 
evolution process is preferably described by a directed graph, where the nodes 
in the graph represents information objepts, and two nodes are connected by a 
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vertex if and only if one of the two nodes share a fraction of basic information 

units that is greater then a certain threshold. 

In another preferred embodiment of the present invention, the system 

monitors and/or controls the traffic in computer networks, the access to and 

5 storage of information in storage elements and the usage of information by 

applications and their users, in order to identify and/or classify information 

objects and to assign or enforce a policy in accordance with the content of the 

information objects. The policy may contain one or more restrictions on the 

usage of the information object. 

10 In another preferred embodiment of the present invention, the system 

extracts a descriptor for each information object, based on statistical analysis of 

the information objects. 

In a preferred embodiment of the present invention, the limitations on 

the usage of the information object comprise limitations to at least one of the 

15 following: 

viewing the information object; 

changing the information object's format; 

Changing the information object representation; 

changing the information object properties; 

20 editing the information object; 

transferring the information object; 

storing the information object; 

printing the information object; 

In a preferred embodiment of the present invention, the defined policy 
25 also includes adding forensic information to the content, e.g., by adding a 
unique textual watermark per instance or group of instances of the information 
objects. 

In a preferred embodiment of the present invention, the defined policy 
also includes replacing some of the content with other content. 
3 0 In another aspect of the present invention, the identifiers depend only on 

the content of the elementary information units, iand not on their location in the 
information object. 

In a preferred embodiment of the present invention, the identifiers of the 
basic information units and the information objects are stored in a database. 
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In another preferred embodiment of the present invention, the 
information objects are clustered according to their relative distances, and a 
default policy is assigned to each cluster. 

In another preferred embodiment of the present invention, the distance 
5 between the information objects is an edit distance, described, e.g., in U. 
Manber: Introduction To Algorithms: a Creative Approach, pp 155-158, 
Addison- Wesley publishing company, 1989, ISBN 0-201-12037-2, the contents 
of which are hereby incorporated by reference. In this case, the elementary 
symbols used for evaluating the edit distance are information units. 
10 In another preferred embodiment of the present invention, the distances 

between identifiers of the elementary information units takes into consideration 
their semantic differences, such that if two elementary information units has the 
same semantic meaning, their identifiers will be the same. 

In another preferred embodiment of the present invention, the 
1 5 information is inspected in one or several locations within a computer network. 

In another preferred embodiment of the present invention, an inspection 
point is located on an internal mail server. 

In another preferred embodiment of the present invention, an inspection 
point is located in a file server. 
20 In another preferred embodiment of the present invention, an inspection 

point is located in a proxy server. 

In another preferred embodiment of the present invention, an inspection 
point is located in a database, or database accessing utility / server. 

In another preferred embodiment of the present invention, an inspection 
25 point is located in an application. 

In another preferred embodiment of the present invention, an inspection 
point is located in the operating system. 

In another preferred embodiment of the present invention, an inspection 
point is located in the file system. 
3 0 In another preferred embodiment of the present invention, an inspection 

point is located on an external mail server. 

In another preferred embodiment of the present invention, the system 
monitors transformation of information objects via instant messaging 
applications. 
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In another preferred embodiment of the present invention, the system 
utilizes a SOCKS server (a widely-used circuit level gateway), or an equivalent, 
in order to monitor instant messaging transportation, and to analyze the 
captured messages. 

5 In another preferred embodiment of the present invention, the system 

attempt to capture and analyze files transferred via instant messaging services. 

This is done by analyzing the file transfer message sent by the instant 

messaging server, locating the address (e.g., IP address) and relaying the 

transport on behalf of the participant that send the file. 
10 In another preferred embodiment of the present invention, the system 

constructs default groups based on content identification. The groups are 

preferably constructed based on former usage involving information objects 

with similar characteristics. 

In another preferred embodiment of the present invention, the system 
15 utilizes methods that are resilient to deliberate attempts to change the apparent 

content of the information, by creating identifiers that are invariant to at least 

some of the transformations in the information objects. 

In another preferred embodiment of the present invention, the system 

contains a module operable to detect cases in which the information object has 
20 been subjected to manipulation in order to avoid its detection, classification or 

identification. 

In another preferred embodiment of the present invention, the system 
contains a module operable to handle various documents that are derived from 
the same templates (e.g., standard contacts). The template documents are stored 

25 in a special database as information objects. Each document that is derived from 
a given template document is treated as a compound information object, that is 
it comprises template information and structure from the template information 
object as well as one or more information objects that represent the additions or 
changes from the basic template document. The template information object 

30 preferably contains the minimal number of elementary information units that are 
presented in all the documents that are derived from the said template. 

In another preferred embodiment of the present invention, the system 
allows automatic inheritance of a template's policy to a derived information 
object. 



7 
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In another preferred embodiment of the present invention, the system 
allows automatic classification of the information as per belonging to a domain 
of knowledge. 

In a preferred embodiment of the present invention, the system allows 
5 for a default policy to be applied on information objects on which no previous 
policy was defined. The policy is preferably also based on the domain of 
knowledge to which the said object belongs. 

The various unique identifiers of the information units are preferably 
stored in a database. The order in which the identifiers are presented in the 
10 document can also be stored, in order to allow for better detection and 
identification. 

In another preferred embodiment of the present invention, the policy can 
be defined in a manner that allows a user to view / access / manipulate / copy / 
transfer / print only selected information objects that are parts of the compound 
15 information object. In this case, the system preferably utilizes a mechanism that 
enables to maintain the coherency of the text. 

In another preferred embodiment of the present invention, the system 
utilizes identification methods based on the edit distance between information 
objects, where the basic sequence on which the edit distance is evaluated is a 
20 subset of the sequence of identifiers of the elementary information units 

In another preferred embodiment of the present invention, the number 
of identifiers of the elementary information units is reduced by performing a 
random filtering or other filtering method. 

In another preferred embodiment of the present invention, the 
25 identification is based on a list of salient wprds or elementary units, that are 
selected in a manner that ensures that every portion of the text that is larger then 
a certain threshold is covered, i.e., contain at least one word from the list. 

In another aspect of the present invention, the system scans for pre- 
designated information objects in storage devices, such as the user's hard-disks, 
30 by utilizing client software. 

In another aspect of the present invention a method and a system for 
knowledge management and control is presented, based on modular and abstract 
description of knowledge. In this case, the elementary units are denoted as facts. 
As a first step in practicing the invention, elementary facts (EF) are defined. 
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These elementary facts may be represented as sentences (e.g., "Mr. John Doe 
earned 65000$ in 2001"), as entries to a database etc. The system then assigns 
representation-independent identifiers and indices for each elementary fact. A 
knowledge object consisting of one or more facts is thereafter defined by the 

5 system. For example, a set of facts about Mr. John Doe can be considered as a 
knowledge object. Within the context of this aspect of the present invention, a 
knowledge object is the basic ingredient on which a policy is defined. A simple 
knowledge object is a knowledge object that can be fully described as a set of 
elementary facts, while a compound knowledge object is a knowledge object 

10 that consists of two or more simple knowledge objects. A knowledge class is 
the set of all knowledge objects on which precisely the same policy is defined. 
An instance of a knowledge object is a specific representation of the knowledge 
In another aspect of the present invention, the system allows authorized 
persons to override any automatic decision of the system. 

15 In another aspect of the present invention, a personalized watermark is 

added to any instance of the information objects while being distributed to the 
various recipients. 

In a preferred embodiment of the present invention, the system performs 
extensive logging of all the events and the operations performed on selected 
20 information objects along the information lifecycle. 

According to another aspect of the present invention there is provided a 
method for information identification comprising: 

finding elementary information units within said information object; and 
deducing information about the identity of said information object from 

25 identification of said elementary information units found within said 

information object. 

According to a further aspect of the present invention there is provided a 
method for changing automated computerized exchange of information within 
an information object having overall coherency, the method comprising 
30 selecting amongst and carrying out at least one of tiie following: 
deleting part of said information; 
replacing part of said information; and 
inserting an additional part to said information, 



9 
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wherein said carrying out additionally comprising comprises the 
preservation of the coherency of said information within said information 
object. 

Preferably, said changing of said information object is done in order to 
5 eliminate parts having policies that do not allow for said at least one action to be 
executed while they are in the document 

In an embodiment, said changing of said information object is carried 
out in order to personalize said information object. 

Preferably, said changing of said information object is carried out in 
10 order to customize said information object for a specific use. 

In an embodiment, said changing of said information object is done in a 
manner selected to achieve at least one of the followingpreserving said 
coherency comprises at least one of: 

preserving the coherency of said information object; maintaining 
1 5 seamlessness; preserve preserving the structure of said information object; 

preserving the linguistic coherency of said information object; preserving the 
formatting style of said information object; and preserve the pagination style of 
said information object. 

Preferably, said information objects comprise compound information 
20 objects and wherein said changing of said information object is made to 
constituent parts of a compound information object. 

The method may be carried out over a network having users with 
different access rights to said information object, said selecting and carrying out 
being to adapt said information object to conform to access rights of a one of 
25 said users to whom said information object is released. 

According to a further aspect of the present invention there is provided 
apparatus for automatic information identification to enforce an information 
management policy on information objects, the apparatus comprising: 

a scanning module for finding elementary information units within said 
30 information object; and 

a deduction module for deducing information about the identity of said 
information object from identification of said elementary information units 
found within said information object, said, deduced identity being usable to 
obtain a corresponding policy rule for applying to said information object. 



10 
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In an embodiment, said information objects comprise at least one simple 
information object, said simple information object comprising one of the 
following: 

an elementary information unit; 
5 a set of elementary information units; and 

an ordered set of elementary information units. 

Preferably, said elementary information units comprise at least one of 
the following: 

a sentence; a sequences of words; a word; a sequence of characters; a 
10 character; a sequence of numbers; a number; a sequence of digits; a digit; a 
vector; a curve; a pixel; a block of pixels; an audio frame; a musical note; a 
musical bar; a visual object; a sequence of video frames; a sequence of musical 
notes; a sequence of musical bars; and a video frame. 

Preferably, said deduction module is further configured to assign 
.1 5 elementary information unit identifiers to elementary information units after 
identification. 

Preferably, said deduction module is further configured to utilize said 
elementary information unit identifiers in said deducing. 

In an embodiment, said information object identification is carried out 
20 on an instance of said information object, said information object instance being 
said information object in a specific format. 

Preferably, said deduction module is configured to provide said 
elementary information unit identifiers in a manner determined at least partly 
by the content of said elementary information units which they are assigned to. 
25 In an embodiment, said elementary information unit identifiers are 

solely determined by said content. 

Preferably, said deduction module is configured to provide said 
elementary information units identifiers in a manner at least partly determined 
by locations within an information object of respective elementary information 
30 units to which they are assigned. 

Apparatus according to the invention may include a policy attachment 
unit associated with said deduction module, said policy attachment unit being 
configured to use said deducing to attach to said information object an 
information object policy, said policy comprising at least one of the following: 
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an allowed distribution of said information object; 
a restriction on distribution of said information object; 
an allowed storage of said information object; 
a restriction on storage of said information object; 
5 an action to be taken as a reaction to an event; 

an allowed usage of said information object; and 
a restriction on usage of said information object. 
Preferably, said deducing comprises utilizing conditional probabilities 
for at least one of the following: 
1 0 identification of information objects; 

classification of information objects; and 
identification of a knowledge domain of information objects. 
According to a further aspect of the present invention there is provided 
apparatus for automated computerized exchange of information within an 
1 5 information object having overall coherency, the apparatus comprising a 

selector for selecting amongst at least one of the following data modifications: 
a deletion of part of said information; 
a replacement of part of said information; and 
an insertion of an additional part to said information, 
20 the apparatus further comprising a data modification unit associated with 

said selector for carrying out said selected modification within said information 
object, said data modification unit being associated with a coherency retention 
module for detecting coherency features of said information object and altering 
said modification in order to preserve said detected coherency features within 
25 said information object. 

According to a yet further aspect of the present invention there is 
provided apparatus for automatic information identification of information 
objects, the apparatus comprising: 

a scanning module for finding elementary information units within said 
30 information object; and 

a deduction module for deducing information about the identity of said 
. information object from identification of said elementary information units 
found within said information object, said deduced identity being usable for 
controlling use of said information object, 

12 



WO 2004/040464 PCT/IL2003/000889 

Preferably, said deduction module is further configured to assign 
elementary information unit identifiers to elementary information units after 
identification. 

The present invention successfully addresses the shortcomings of the 
5 presently known configurations by providing a method and system for robust 
tracking and management of information and knowledge, which can efficiently 
serve digital information "management, audit and control. 



BRIEF DESCRIPTION OF THE DRAWINGS 
1® For a better understanding of the invention and to show how the 

same may be carried into effect, reference will now be made, purely by way of 
example, to the accompanying drawings, in which: 

Fig. 1 is a simplified schematic diagram illustrating a compound 
information object constructed from three simple information objects, in 
1 5 accordance with a preferred embodiment of the present invention; 

Figure 2 is a simplified schematic diagram illustrating notation for 
constructing a directed edge in a graph, constructed and operative in accordance 
with a preferred embodiment of the present invention; 

Figure 3 is a simplified graphical representation of the evolution process 
20 over time of an information object; 

Fig. 4 is a simplified schematic diagram depicting the representation of 
information objects using elementary information units, in accordance with a 
preferred embodiment of the present invention; 

Fig. 5 is a simplified schematic diagram illustrating one-to-one 
25 correspondence between information objects and policies, according to a 
preferred embodiment of the present invention; 

Fig 6 is a simplified block diagram illustrating a basic network with 
monitoring units configured for basic network monitoring, constructed and 
operative according to preferred embodiment of the present invention; 
30 Fig 7 is a simplified flow diagram illustrating a method for extraction of 

identifiers from an instance of an information object, constructed and operative 
according to a preferred embodiment of the present invention; 
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Fig 8 is a simplified functional process diagram illustrating 
identification of an information object instance, operative according to a 
preferred embodiment of the present invention; 

Fig 9 is a simplified flow diagram which illustrates a method for policy 
5 enforcement with respect to information object instances, operative, according to 
a preferred embodiment of the present invention; 

Figs 10A and 10B illustrate respectively an organizational structure and 
its transformation into a data structure, therefrom to define a default policy, in 
accordance with a preferred embodiment of the present invention; 
10 Fig. 1 1 is a simplified flowchart of a method that allows for document 

classification, according to a preferred embodiment of the present invention; 

Fig. 12 is a simplified block diagram illustrating a system that allows for 
document classification, according to a preferred embodiment of the present 
invention; 

15 Fig. 13 is a simplified flowchart illustrating a method for augmenting a 

conditional access system by providing information-based clearance, according 
to a preferred embodiment of the present invention; 

Fig. 14 is a simplified block diagram illustrating a system for 
augmenting conditional access by providing information-based clearance, 
20 according to a preferred embodiment of the present invention; 

Fig. 1 5 is a simplified flowchart of a method for determining the 
integrity of an information object, and 

Fig. 16 is a simplified block diagram illustrating a system for ensuring 
data integrity according to a preferred embodiment of the present invention. 

25 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present embodiments describe a method and system for managing 
confidential information. In particular, the present invention described methods 
30 for information tracking, identification, classification and management along 
the information lifecycle, utilizing a modular and abstract description of 
information. 

According to embodiments, a system for information management and 
control is presented which uses modular and abstract descriptions of the 
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information. The system allows for flexible and efficient policy management 
and enforcement, in which a policy can be defined with direct respect to the 
actual information content of the digital information items. The iriformation 
content can be of various kinds: e.g., textual documents, numerical 
5 spreadsheets, audio and video files, pictures and images, drawings etc. The 
system can provide protection against information leakage independently of 
other information management systems. 

Before explaining in detail preferred embodiments of the present 
invention, the following terminology and nomenclature are introduced: 
10 • Canonization: transformation of digital data into a standard format. 

E.g., transforming of textual documents in various formats to plain 
text in a Unicode format. In order to be able to infer information 
regardless of the format, it is preferable to canonize the input data. 

• An elementary information unit (EIU) is a piece of (preferably 
15 canonized) information to which a unique identifier is assigned. These 

elementary information units may be sentences, sequences of words, 
sequences of characters, sequences of frames in a video content, 
segments of audio files etc. 

• An information object (IO) consists of one or more information 
20 units. For example, a textual document can be considered as an 

information object, and various sequences of words are considered as 
the basic information units. Within the context of this invention, an 
information object is the basic ingredient on which a policy may be 
defined. 

25 • A simple information object (SIO) is an information object that can 

be fully described as a concatenation of basic information units 
(possibly with overlaps). For example, a paragraph in a textual 
document, a drawing or a "cut" in a film can be considered as a simple 
information object. 

30 • A compound information object (CIO) is an information object that 

consists of an aggregation of two or more information objects (e.gi, a 
textual document with an embedded numeric worksheet), together 
with the information that is needed for their combination. The 
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aggregation can be hierarchical: a compound information object can be 
constructed by an aggregation of other compound information objects, 
which in turn are constructed by an aggregation of other compound 
information objects, etc. 
5 • An information class is the set of all information objects on which 

precisely the same policy is defined. 

• An instance of an information object is a specific representation of an 
information object, e.g., a file that contains a certain textual document 
in a MS- Word format. 

10 • A User is an agent who accesses and/or manipulates and/or distributes 

the managed information. 

• A Users group is a collection of users for which a certain policy can 
be defined. 

o A Member based group: is described by elaborating all 
15 the members. 

o A Property based group: is defined by a property or a 
rule that applies to a property of the users , e.g., 

■ Organization or organizational department. 

■ Geographical location (e.g., a certain campus) 
20 ■ Business category: (e.g., clients, suppliers, etc.) 

o Groups union: users belong to at least one of the groups 

in a list of groups, 
o Complementary group: all the users that are not in a 

certain group. I.e., everyone except the customers. 
25 Group intersection: the set of users that belong to all of the 

groups in a list. 

A group can contain other groups, as well as users, as members. 
• A Content group: the collection of all the contents on which a 
certain policy is defined. 
30 o Member based group: described by elaborating all the 

identifiers of the contents in the group, 
o Property based group: defined by a property or a rule 
that applies to a subset of the organizational content, e.g., 
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■ Format (Word, PDF, etc.) 

■ Template content. 

■ Classification (top secret, secret, confidential) 

■ Importance 

5 ■ Information types - legal, financial 

■ Allowed recipients 

■ Limitation name (not for laptops, Top 
management) 

o Groups union: contents belong to at least one of the 
10 groups in the unions. 

o Complementary group: all the contents not in a certain 

group. I.e., everyone except the costumers. 
o Group intersection: the set of contents that belongs to all 
the intersecting groups. 
15 As above, a group can contain other groups as members. 

• An owner: A user or a group of users that are allowed to define and/or 
to change a policy with respect to a certain content and/or a content 
groups that he/she owns. Many times, the owner is the author of the 
document. Owner is a property associated with a document, the owner 

20 authorization may be defined by rules defined based, at least partially 

on that property, the owner of a document may be defined according to 
a rule based on the event of the initial signing (creation with respect to 
the policy system) of the document. 

• A policy assigned to an information object comprises of one or more 
25 . rules, which determine the limitations and restrictions with respect to 

usage and distribution of the said information object. 

• A rule within a policy is a function that maps an event, together with 
the relevant parameters, to an action. 

• An Event is a trigger that initiates execution of a pre-defined policy. 
30 An event is specified by an event type and event parameters. 

• Ah Action: is a sequence of one or more steps executed according to 
the event, the policy and the user. 

A role of an entity, with respect to the policy assigned an information object, 
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determines the set authorizations given to that entity. The entity can be a user or 
a computerized system. The role may, e.g., allow the entity to override a policy 
assigned to an information object by some of the other entities. 

Before explaining at least one embodiment of the invention in detail, it 
5 is to be understood that the invention is not limited in its application to the 
details of construction and the arrangement of the components set forth in the 
following description or illustrated in the drawings. The invention is capable of 
other embodiments or of being practiced or carried out in various ways. In 
addition, it is to be understood that the phraseology and terminology employed 
10 herein is for the purpose of description and should not be regarded as limiting. 

Reference is now made to Fig. 1, which is a simplified schematic 
diagram illustrating a compound information object comprising lower-level 
information entities according to a preferred embodiment of the present 
invention. The compound information object 110 comprises three simple 
15 information objects: 120, 122 and 124, and further uses auxiliary information 
130. Each simple information object in turn comprises a concatenation of 
elementary information objects (140, 142 and 144). 

The information evolution process is preferably described by a directed 
graph, where the nodes in the graph represent information objects. Figure 2 
20 illustrates the notation for constructing a directed edge in the graph. If the 
information object a 210 contain all the elementary information units that exists 
in the information object b 212, then the node a will be connected to the node b 
with an edge directed from a to b (Fig 2a) and vice versa (Fig 2b). If the two 
nodes share a fraction of elementary information units that is greater then a 
25 certain threshold then a vertex connects the two nodes is described by a bi- 
directional edge. (Fig 2c)Figure 3 is a simplified graphical representation 
showing an evolutionary process of an original information object X 310. From 
an original information object X 310 two versions, XI 320 and X2 330 are 
derived. These versions are subsets of the original object, and there is therefore 
30 an edge directed from X 310 to XI 320 and X2 330. XI 320 and X2 330 share a 
substantial quantity of elementary information units, and there is therefore a bi- 
directional edge between XI 320 and X2 330. From XI two basic versions are 
derived, XI. 1 340 and XI. 2 345, and there is therefore an edge directed from 
XI 320 to XI. 1 340 and to XI. 2 345. The derivations, again, share a substantial 
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amount of elementary information units, so there is another bi-directional edge 
between XI. 1 340 and XI. 2 345. X1.3 350, and X1.4, 355, are versions of XL1 
340 and X1.2 345, respectively, but instead of being merely subsets of Xl.l 340 
and XI. 2 345, they also contain additional information units, which are not 
5 presented in Xl.l 340 and XI. 2 345. There is therefore a bi-directional edge 
between Xl.l 340 and XI. 3 350, and a bi-directional edge between Xl.l 340 
and XI. 3 350, Similarly, XI. 5 360 is a version of X1.3 350- In this case, some 
information units have been subtracted and some information units, have been 
added. X2.1 370 is derived from X2 330, and contains a subset of the 

10 elementary information units of X2 330, and there is therefore an edge directed 
from X2 330 to X2.1 370. X2.1.1 380 is in turn derived from X2.1 370. X2.1.1 
380 again, contains only a subset of the information units of the object from 
which it derives, and there is therefore an edge directed from X2.1 370 to 
X2.1.1 380. Note that in order to provide robustness, the description of an 

15 information object in terms of basic information units contains, in general, a 
high level of redundancy, and there should preferably be provided a 
considerable overlap between elementary information units that are used to 
describe an information object: e.g., if the information object is a "cut" of a 
video content, the elementary information units may be overlapping sequences 

20 of frames. Reference is now made to Fig. 4 which is a simplified diagram 
schematically illustrating such a state of affairs: an information object 400 is 
composed of the elements a 410, b 412, c, 414, d, 416, e, 418, and/ 420. These 
elements can be words in a text object, frames in a video objects etc. The 
elementary information units are sequences of elements: the elementary 

25 information unit SI 430 comprises the sequence a 410, b 412 and c, 414, the 
elementary information unit S2 432 comprises the sequence b 412, c 414 and d, 
416, the elementary information unit S3 434 comprises the sequence c 414, d 
416, and e 418 and the elementary information unit S4 436 comprises the 
sequence d 416, e 418 and / 420. The information object can therefore be 

30 described as a simple concatenation of S 1 430 and S4 436. 

A policy with respect to a compound information object may also be 
compound, such that a different policy will be assigned to «ach of the simple 
objects that constitute the compound information object. For example, one 
simple information object from the set of information objects that constitutes 
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the compound information object may be classified "confidential", while 
another information object may be classified as "public". 

Reference is now made to Fig. 5, which illustrates one-to-one 
correspondence between information objects and policies, according to a 
5 preferred embodiment of the present invention. Compound information object 
500 contains the simple information object A 510, to which policy X 512 
applies, the simple information object B 520, to which the policy Y 522 applies, 
and the simple information object C 530, to which the policy Z 532 applies. It is 
noted that a policy may comprise ignoring a given object, or may include any 
10 kind of default behavior. The policy can therefore also be defined as a property 
of the information object. The policy can be a simple rule (e.g., "can be viewed 
only by X, Y, and Z") or a complete set of rules, determining restriction on the 
usage imposed on the various members in the organization. 

Such rule based policy may be group or logic based, attaching a 
15 complete policy or a policy element (i.e., a directive which is a part of a policy) 
to a Boolean expression, such that the policy or policy element is enforced on 
all transactions which satisfy the expression. This expression may be composed 
of variables whose value is determined by outside Boolean functions (such as 
membership in a group, time based functions, etc.). An equivalent 
20 implementation may be based on group membership, in a preferred embodiment 
of the present invention, some of the expressions are defined as lazy 
expressions, and are not evaluated as ; soon as it gets bound to a variable, but 
only when something forces the evaluator to produce the expression's value. 
The latter allows for a more efficient implementation. 
25 In another preferred embodiment of the present invention, a language, 

resembling a programming or scripting language, is used to define an ordered 
calculation that results in a policy. The ordered calculation may directly result 
from the language, or niay be inferred from a hierarchy of calculation 
dependencies. 

30 In a preferred embodiment of the present invention, a policy is assigned 

to selected nodes in the graph, and nodes may inherit the policies assigned to 
their predecessors. 

In another preferred embodiment df the present invention, a notion of 
distance or similarity/dissimilarity is defined between any two of the 
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information objects. According to a preferred embodiment of the present 
invention, the similarity measure between information object A and information 
object B is based on the parts that are common between the two objects. For 
example, if the information object A is described by 400 elementary 
5 information units, and the information object B is described by 500 elementary 
information units, out of which 300 elementary information units are common 
to A and B, then the similarity between A and B is 300/400= 0.75, and the 
similarity between B and A is 300/500= 0.6. 

In another preferred embodiment of the present invention, the system 
10 utilizes an identification method based on the edit distance (described, e.g., in 
Manber referred to above) between information objects, where the basic 
sequences on which the edit distance is evaluated are sequences of identifiers of 
the elementary information units. For example, if the identifiers of one sequence 
are A B C and D and the identifiers of the second sequence are A B E C D, then 
15 the distance is "1 insert". 

In another preferred embodiment of the present invention, the distances 
between identifiers of the elementary information units take into consideration 
their semantic differences, such that if two elementary information units have 
the same semantic meaning, then the two elementary objects share the same 
20 identifier. In a preferred embodiment of the present invention considering of 
semantic content is performed by mapping all synonymous words to the same 
numerical identifier, such that replacing a word with its synonym would not 
alter the numerical representation of the word. In another preferred embodiment 
of the present invention, the identifiers of elementary information units are 
25 invariant also to reordering of words. 

In another preferred embodiment of the present invention, identifiers are 
based on embedding the elementary information units in a Euclidean space in a 
manner that approximately preserves the pairwise semantic distances between 
the elementary information units. The embedding process can utilize a method 
30 such as the one described in N. Linial, , Y. Rabinovich: The Geometry of 
Graphs and Some of its Algorithmic Applications. Combinatorica 15(2): 215- 
245,1 995, the contents of which are hereby incorporated by reference. 
. . In another preferred embodiment of the present invention, the 
information objects are clustered, using a clustering method that utilizes the 
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pair-wise distances, or any other dissimilarity measure, between information 
objects. In this case, users may assign a policy to one or more representative 
objects in each cluster, and a default policy is assigned to all the objects in the 
cluster based on the policy assigned to the representative objects. This method 

5 can be used for automatic creation of default groups. Methods for clustering 
using a measure of similarity and/or dissimilarity between pairs of objects or for 
clustering of nodes in a directed graph, such as the graph depicted in figure 2, 
are known and described e.g., in R. O. Duda, P. E. Hart and D. G. Stork: 
Pattern Classification (2nd Edition), John Wiley & Sons, Inc. 2001, ISBN 0- 

10 47 1 -05669-3, the contents of which are hereby incorporated by reference. 

In another preferred embodiment of the present invention a default 
policy is assigned to information objects according to the policy assigned to its 
neighbors, e.g., using the "n nearest-neighbors rule" described, e.g., in Duda et 
al referred to above. This requires defining a measure of similarity/dissimilarity 

15 between information objects, e.g., using the similarity/dissimilarity measure 
described above. 

An information closure is a clique in the graph, i.e., a group of nodes 
such that each node is connected to all the other members in the group. Because 
of the similarity within an information closure, it is likely that the policies 
20 regarding its constituents are similar or even equal. Furthermore, it is also likely 
that exact identification of a document as being similar to a single node in the 
clique may be a relatively hard task. Therefore, in another preferred 
embodiment of the present invention, there is an. option to define a default 
policy for the clique to be applied to any case where an information item has 
25 been identified to be part of the clique but cannot be matched to any single 
node. Such a default policy may be defined explicitly, or implicitly based on the 
common ground of the policies defined for the individual clique nodes. 

In another preferred embodiment of the present invention, the system 
extracts a descriptor for each information object, based on a series of unique 
30 identifiers for each elementary information unit. The extracted identifiers can be 
based on a hash function derived from a numerical representation of the 
elementary information unit. Thus, in a case in which information units are 
textual, a numeric representation of the text can be based on the ASCII or 
Unicode representation of the characters. 

22 



WO 2004/040464 PCT/IL2003/000889 

In another aspect of the present invention, the identifiers depend only on 
the content of the elementary information units, and not on their order or 
location in the information object. 

In a preferred embodiment of the present invention, the identifiers of the 
5 basic information units and the information objects are stored in a database, in a 
manner that allows efficient retrieval. This way, the system can efficiently 
utilize the identifiers in order to compare stored identifiers with identifiers of 
the analyzed objects. 

In another preferred embodiment of the present invention, the system 
10 monitor the traffic in computer networks and possibly also via fax servers and 
fax machines, in order to identify and /or classify information objects and to 
assign or enforce a policy in accordance with the content of the information 
objects. The policy may contain restrictions on the usage, integrity and 
distribution of the information object. 
15 Reference is now made to Figure 6, which is a simplified block diagram 

illustrating a basic network and showing monitoring units distributed around the 
network, to allow for basic network monitoring according to preferred 
embodiment of the present invention. Digital content containing information 
objects resides on designated directories in a file system 602, and a policy 603 
20 is assigned thereto as follows: a certain initial policy is assigned to each 
information object - either explicitly or implicitly, implicitly meaning a default 
policy. The file system 602 may resides on a dedicated server or on networked 
computers belonging to the network users. The information object identifier 604 
may scans the directories and extract identifiers from the information objects in 
25 the various files. The identifiers are thereafter stored in the identifier database 
606. When a user 607 subsequently attempts to disseminate digital content via 
the organizational mail server 608, organizational mail server monitoring unit 
610 obtains the message and its attachments and utilizes information object 
identifier 612 in order to identify the information objects in the transport. Thfe 
30 information object identifier 612 in turn utilizes the identifier database 606 in 
order to detect information objects to which a policy has previously been 
assigned. Results of the above-described detection process are sent to the policy 
reference monitor 632 of the central control unit 630. The policy reference 
monitor instructs the organizational mail-server monitoring unit 610 whether to 
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. allow transmission of the inspected mail, based on the results. Thus, for 
example, if the transport is being sent to an address outside the immediate 
organization, and an object is found within the transport having- a policy 
indicating that it should not be distributed outside the organization, then the 
5 transport is stopped. 

In a preferred embodiment of the present invention, as well as blocking 
the transport, or as an alternative thereto, the organizational mail-server 
monitoring unit 610 may, notify the user and/or the administrator about 
dissemination attempts that do not comply with the assigned policy and7or about 
10 suspected traffic. A substantially similar process may be applied to data 
dissemination via an SMTP proxy 614, HTTP proxy 620 and fax server 634. In 
these latter cases, the respective monitoring units: the SMTP proxy monitoring 
unit 616, the HTTP proxy monitoring unit 622 and the fax monitoring unit 636 
obtain content information of the transport and utilize the respective information 
15 object identifiers 618, 624 and 638 in order to identify the information objects 
in the transport. Monitoring fax traffic can be performed, e.g., using the method 
described in US provisional patent application 60/450,336 "A Method and 
System for Preventing Information Leakage via Fax Machines", filed February 
28 th 2003, the contents of which are hereby incorporated by reference. 
20 In a preferred embodiment of the present invention the central control 

unit 630 also plays a part in enforcement of policy, that is to say in transport 
blocking. In particular the central control unit controls traffic passing the WAN 
gateway 640 and can be used to prevent a particular transport from passing 
outwardly via gateway 640. 
25 Reference is now made to Fig 7, which is a chart illustrating a method , 

for extraction of identifiers from an instance of an information object, 
constructed and operative according to a preferred embodiment of the present 
invention. The input to the method is an information object instance 710 such 
as a file that contain a MS-word document. The instance is subjected to a pre- 
30 processing stage 720, which includes identification of the format or type 722 
(e.g., identification of the file format as a MS-Word file), instance opening 724 
(e.g., file opening) and canonization 726. Canonization may for example 
comprise transforming the MS-Word file to plain Unicode text. The result is a 
canonized information object 730. From the canonized information object 730 s 
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one or more identifiers are then extracted 740 and are stored in an identifiers 
database 750, preferably together with the corresponding policy. 

Fig. 8 illustrates a system and method for identification of an 
5 information object instance wherein after the identifiers of information objects 
are extracted and are stored in a database 750 together with their assigned 
policy - the system attempts to identify similar information objects within the 
digital traffic and /or within storage devices, in order to enforce the assigned 
policies.: as in the method described is figure 7 the inspected information object 
10 instance 710 is subjected to pre-processing 720, a canonized information object 
730 is foimed and identifiers are extracted 740. The extracted identifiers 742 are 
used for classification and identification by a classification and identification 
module 810. Hie classification and identification module 810 utilizes an 
identifier comparator 812 in order to compare say, existing identifiers from 
1 5 identifiers database 750 with the extracted identifiers 742. 

After evaluating the results a decision 820 regarding the identification of 
the information object is made. Based on this decision, a policy with respect to 
the inspected information object is enforced: e.g., if the decision is that the 
inspected information object 710 is substantially equivalent to another 
20 information object, to which its assigned policy limits its distribution to a 
certain users group in the organization, then attempts to disseminate information 
object outside the group should be blocked and reported. The notion of 
similarity may for example be based on the relative number of elementary 
information units that exists in both information objects, as explained above. 
25 The notion of "substantially equivalent" involve some subjective aspects, and it 
is therefore preferable that the threshold for similarity is tunable, so that a given 
organization can set the threshold after observing impacts of several threshold 
levels on "false positive" and "false negative" rates within its information 
traffic. 

30 Reference is now made to Fig 9 which is a simplified flow diagram 

illustrating a method for policy enforcement with respect to information object 
instances, constructed and operative according to a preferred embodiment of the 
present invention. Parts that are the same as in previous figures are given the 
same reference numerals and are not referred to again except as necessary for 
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understanding the present embodiment. The inspected information object 
instance 710 forms a first entity, and successful identification thereof is 
indicated by stage (A) 910. Subsequently, the policy assigned to the' identified 
information object is resolved, as indicated by stage (B) 920. Subsequently, the 
5 relevant components in the system are instructed to enforce the policy thus 
identified, in a third stage (C), 930. For example, the SMTP server may be 
instructed to block the transmission of the information object instance. 
Subsequently, reports are sent to the relevant entities or persons, for example to 
the system administrator and the sender in a stage (D) 940, and finally the full 
10 details of the event, which may include sender and recipients identities, 
information object identifier, time and date etc. are logged in stage (E), 
indicated by reference numeral 950. 

In a preferred embodiment of the present invention, frequent and/or non- 
salient items (common and frequent words carrying little content-specific 
15 information) are removed from the canonized information object before the 
extraction of identifiers, in order to promote the efficiency and the robustness of 
the identification process. 

In a preferred embodiment of the present invention, the limitations on 
the usage of the information object comprise limitations to at least one of the 
20 following: 

viewing the information object; 
editing the information object; 
transferring the information object; 
storing the information object; 
25 printing the information object; 

changing the representation of the information object; 
changing the properties of the information object; 
changing the format of the information object; 
copying portions of the information object; 
30 In a preferred embodiment of the present invention, the limitation on the 

usage is application specific, i.e., the system allows usage of a specific 
information object only by one specific application or some specific 
applications. 

In a preferred embodiment of the present invention, the limitation on the 
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usage is operation-system specific, i.e., the system allows usage of a specific 
information object only by a specific operation system or operation systems. 

In a preferred embodiment of the present invention, the limitation on the 
usage is user specific, i.e., the system allows usage of a specific information 

5 object only by a specific user or users. 

In a preferred embodiment of the present invention, restrictions on 
editing, such as restrictions on copying, cut & paste, etc., may be imposed in 
document-specific manner, i.e., the system may allow editing of a certain 
document and yet prevent editing with others. Yet again it may prevent 

1 0 copying of information objects from one document to another, whilst permitting 
within the document itself. 

In a preferred embodiment of the present invention, defined policies 
may nclude adding forensic information to an information object. This can be 
achieved by altering parts of the information object in a manner that is 

15 preferably substantially imperceptible, as described in PCT application number 
IL02/00464, filed June 16 th , 2002 the contents of which are hereby incorporated 
by reference. 

In a preferred embodiment of the present invention, the defined policy 
also includes replacing some of the content with other content in one or more of 
20 the copies of the original content. E.g.: 

• Classified paragraphs in a document can be replaced by un- 
classified paragraphs, or by null or contentless paragraphs, 
rendering the document unclassified. 

• Paragraphs may be added or removed according to the needs of 
25 the recipients, in order to construct a customized document. 

In these cases, the system preferably utilizes a mechanism that enables 
maintaining the coherency of the text, taking into account linguistic 
considerations. In order to render the changes as seamless as possible, the 
system preferably also preserves the structure of the information object, the 
30 formatting style of the information object and the pagination style of the 
information object. In order to achieve that, the spaces between words and lines 
. and the page margins can be manipulated by the system in a manner that is 
substantially unnoticeable. 

* In another preferred embodiment of the present invention, the 
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information is inspected in one or several locations within a computer data 
network. Several kinds of traffic may be monitored, including instant 
messengers, mail, web, file transfer protocols, chat protocols (e.g. IRC) etc. 

In a preferred embodiment of the present invention, monitoring is 
5 preformed using sniffing. In the sniffing embodiment, at least one listening 
node exists on the communication network and messages are analyzed on 
transit, in a manner analogous to wiretapping. Using sniffing, it can be that the 
information is already transferred when a breach of policy is detected. It is 
however possible to detect, monitor, log and alert regarding the illicit transfer, 
10 and it is often possible to stop it at midpoint, mitigating some of the damage. 
Implementing this method, different protocols need to be processed differently, 
utilizing e.g., the methods described in US patent application number 
10/003,269 and PCT application number IL02/00037, the contents of which are 
hereby incorporated by reference, as the traffic may contain a mix of protocols. 
15 In another preferred embodiment of the present invention, monitoring is 

performed using traffic forwarding. In traffic forwarding, a node through which 
the traffic passes monitors the traffic in a manner similar to sniffing as 
described above. However, the traffic is allowed to pass through the node only 
if and when the monitoring indicates that the traffic is authorized. Traffic may 
20 also be altered for example by deleting an attachment, removing sensitive 
information, adding a disclaimer, etc. Altering in this manner is advantageous in 
that it allows for a more versatile and robust pojicy enforcement. In traffic 
monitoring, as for sniffing, different protocols need to be processed differently, 
as described in US patent application number 10/003,269 and PCT application 
25 number IL02/00037, as the traffic may contain a mix of protocols. A router, 
gateway or firewall is usually used as the forwarding node. 

In another preferred embodiment of the present invention, monitoring is 
performed utilizing a Proxy server. The proxy server method is similar to 
traffic forwarding, however, instead of monitpring all traffic, specific kinds of 
.30 traffic are required to pass through a specialized proxy server. Preferably a 
firewall is used to block any and all attempts -to bypass the proxy servers. 
Several established proxy protocols and technologies exists (e.g. HTTP 
proxyirig and SOCKS) which make it easier , for these kinds of servers to 
interface outside systems. The SOCKS protocol is specifically designed for 
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secure TCP proxying in a sensitive environment Thus the proxy server method 
utilizes established support for these methods. The method has the ability to 
monitor, block, and alter traffic, and yet complexity is reduced since it selects 
the traffic it wishes to monitor. 
5 The ability to alter traffic is useful in cases where it is needed to force a 

specific route for traffic (e.g. a proxy), or other behavior, and where 
uncontrollable software or a protocol takes a different route. In such a case it is 
possible to alter a field or fields in the controlling traffic, thus changing the 
behavior of the software. Such is often the case in instant messenger software, 
10 which generally attempts its own peer-to-peer connection. In a preferred 
embodiment of the present invention, the system analyzes the file transfer 
message sent by the instant messaging server, locates the address (e.g., IP 
address) and relays the transport on behalf of the participant that sends the file. 
Using this method, the system has access to the content of the relayed file, 
1 5 which allows the system to analyze the content and to apply the required policy. 

In another preferred embodiment of the present invention, the system 
utilizes a method of detecting deliberate attempts to change the apparent content 
of the information, by creating identifiers that are invariant to at least some of 
the transformations in the information objects. 
20 For example, consider a spreadsheet table with m rows and n columns. 

Denote by Xij the variables in the table; where i denote the index of the row and 
j denotes the index of the column, the identifier of the original file is comprised 
of the 3 (m-Ho+1) numbers: 

<* j >=^z:,f>,v. 

i = l..m, j — 1 ...n 
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Denote by Y f j the variables in the examined table, with m ' rows and n ' 
columns, the identification signature of this table comprised of the 3(m'+n'+l) 
numbers: 



5 <x>~zzZXj, « a>>-£zSn 

i = L.m' t j = LM' 

10 

A comparison scheme is described, using the simple case in which m = 
772 ' and n=n ', that is robust against permutations of the elements of the table and 
against linear transformation that is applied to all the elements. For example it 
may multiply all the elements by a constant, or may add a constant to all the 
15 elements, or both. The scheme is described by the following algorithm: 



Algorithm: 

Define tolerance variables s } ± s 2 and s 3 

20 

Assign: equal = 0; 
Evaluate: 

A,= \(X)-(Y)\ 
25 A 2 = |<* 2 >-<r 2 >| 
a 3 = |<x 3 >-<r 3 >| 

if A, < s x and A 2 < s 2 and A 3 < s 3 Equal=l 
30 Else //check for linear relation of the form y = ax +b 
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Evaluate: 



5 b' = (Y)-a'(X) 

A' = <7 3 > - a' 3 (X 3 ) - 3a' 2 b'(X 2 ) - 3a'b' 2 (X) - b' 3 

if A'<e' 
equal = 1 
10 End 
End 
End 

The above algorithm does not depend on the order of the elements in the 
table, and is robust to linear transformation that may occur, e.g., while 
1 5 transforming financial data from dollars to Euros. 

In order to provide further robustness against cases in which only some 
of the rows or columns exists in the analyzed table, a similar comparison is also 
performed with respect to statistical moments of each row and column. The 
matches are counted and if the number of matched column or matched rows is 
20 greater then a certain threshold, an equal flag is set to "1". The threshold can be 
set according to specific needs which may stem from the sensitivity of the data 
, and the tolerated level of false alarms. 

in another preferred embodiment of the present invention, the system 
identifies graphical information objects, e.g., drawings, utilizing computerized 
25 image-matching techniques, described, in L.G. Brown: A survey of image 
registration techniques, ACM Computing Surveys, 24(4): 325-376, December 
1992, the contents of which are hereby incorporated by reference. 

In another preferred embodiment of the present invention, graphical 
information objects are identified by extracting key features of the objects, such 
30 as lines and their orientation, relationships between components, shapes, color 
and/or intensity distribution etc. 
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In another preferred embodiment of the present invention, pre- 
processing takes place on the images in order to facilitate the identification 
technique, such as canonizing the size and color, reducing noise, scene 
detection, canonizing size, canonizing orientation, canonizing color, removing 
5 color, reducing noise, enhancing area separation, enhancing borders, enhancing 
lines, sharpening, blurring etc. 

In another preferred embodiment of the present invention, there exist 
several indices and/or identification systems, such that the resolution and 
accuracy of identification differ between the indices and/or identification 
10 systems. For example, one system may be very robust but provide a low 
resolution, another provide high resolution with lower reliability, and a third 
one provide high resolution, reliability, robustness and accuracy at the cost of 
high resource use. 

The present embodiment may provide several advantages including the 
15 following: 

■ The ability to pinpoint a closely related segment, by using a very 
accurate index, but to achieve robustness and reliability with another 
index, and speed with yet another one. In most cases the same basic 
algorithm can be used, but with different parameters, options and 
20 thresholds. Several such options for selection of descriptors, 

preprocessing, and other such options are described below (e.g. 
descriptors resistant to manipulation and permutation). The ability to 
rely upon locality of similarity in order to increase reliability. Such 
ability can be achieved by increasing selection of required high 
25 thresholds for general similarity, but then calculating a new threshold 

based on the size of the portion of the document that contains the 
already found similarity. Now study of the localized similarity requires 
a lower threshold. A suitable threshold calculation may use a 
monotonically increasing function of a size, such that the percentage of 
30 similarity is lower for localized similarity. For example, a percentage 

based threshold function, based on the locality size (i.e., the size of the 
area in which similarity was found to be localized) in percentage, where 
for localized similarity over a given minimal percentage linearly 
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increases up to a maximal percentage where there is no significant 

localization of similarity (i.e. the locality size is 100%). 

For example: X is the percentage of the similarity area out of the whole 

document (X=L/D where L is the size of the similarity area, and D is 

5 the size of the document), Y is the threshold in percentage terms of the 

document (Y=S/D whfere S is the similarity within L), so 

Y=max(0.2,X/2). 

■ Increasing speed and accuracy of detection, by using a series of 
increasingly accurate indices on the suspect list as produced in the previous 
10 stage each time. Thus the suspect list shortens after each index.In another 
preferred embodiment of the present invention, the system contains a module 
operable to detect cases in which the information object has been subjected to 
manipulation in order to avoid its detection, classification or identification. 
Thus, for example, for a manipulation that comprises permutation of some of 
15. the words, one may use two type descriptors: one that is not sensitive to the 
order of the words, and one that is sensitive. The system can thereafter identify 
the content utilizing the type of descriptors that is not sensitive to the order of 
the words (e.g., histograms of the frequencies of the various words), and 
thereby determine to a certain level of probability the fact that the word has 
20 been subjected to permutations utilizing order-sensitive type descriptors. In 
cases where the manipulation comprises replacing characters with other 
characters, for example some kind of substitution or transposition cipher, the 
resulting content may include many words that are not real words, and are 
therefore not included in dictionaries. The analyzer can easily detect such a 
25 state, label the content as "suspected" and transfer the suspected content for a 
more thorough (possibly manual) analysis. 

In a preferred embodiment of the present invention, the analyzer uses 
one or more of the following criteria for determining the plausibility that a 
certain document was subjected to manipulation in order to avoid detection: 
30 • Irregular word distribution patterns including the presence of 

unknown words, and disproportionate lack of common words. 
Irregular distribution of characters. 
• Alphabetic characters mixed, with non-alphabetic and . numeric. 
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characters in words, thus MicroSoft, LOOk, lamp, which are 
changes that might be believed to fool monitoring software. 

• Irregular distribution of word lengths, especially large 
proportions of long words, or all words being of uniform length. 

5 • Files which do not appear to be of the claimed format, thus 

appearing to be unopenable, for example a text file which starts 
with a zip header. 

• Encryption, or any other kind of encoding. 

• Incompatibility with expected punctuation and capitalization 
10 rules, such as apparently ending a paragraph in mid sentence. 

• Disproportionate number of spelling mistakes, especially in 
applications that allow spell-checking. 

In another preferred embodiment of the present invention, the system 
contains a module operable to handle various documents that are derived from 
15 the same template (e.g., standard contacts). The template documents are stored 
in a special database as information objects. Each document that is derived from 
a template document is a compound information object, comprising the 
template information object and one or more additional information objects 
representing the added material or the differences from the basic template 
20 document. The template information object preferably contains the minimal 
number of elementary information units that are presented in all the documents 
that are derived from the said template. 

In a preferred embodiment of the present invention, templates are 
considered as a special kind of Information object. The template information 
25 object comprises a sequence of information units, similarly to simple 
information objects, but the sequence also contains another kind of element, 
dubbed a placeholder. Placeholders may be generic, which is to say that they 
contain another sequence of information units, which may also contain 
embedded information objects, Alternatively the placeholders may be 
30 specialized. Thus in a form or like document a placeholder may contain a 
number, a name, or the like. Placeholders are useful for enhancing identification 
accuracy. Because the content of the placeholders is expected to change in 
different instances of the template, their content is not part of the template itself, 
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thus, they provide clear internal boundaries for the template 

Some examples of document templates are standard disclaimers or 

headers, contracts, forms etc. 

Templates can be filled with other information objects, a process defined 
5 as instantiation. Instantiation consists of replacing the placeholders with 

appropriate information objects. The result of this instantiation is another 

information object, which may have its own instances, i.e. appear in specific 

formats. 

Templates can be defined in two ways: manual and automatic, as 
10 follows. 

In manual definition, the templates are explicitly defined. By its very 
nature, manual definition specifies the exact intention of the person doing the 
definition, albeit at the price of requiring his or her time and attention . Manual 
definition is ideally the common method, and is especially useful for forms and 

15 standard contracts, but is also quite useful for disclaimers, headers, footers, etc. 
One of the main advantages of manual definition is the relative simplicity of 
providing explicit definitions for placeholders. That is to say the user actively 
defines the placeholders. It will be appreciated that placeholders are optional, 
meaning that templates are not required to have placeholders, but they are an 

20 integral part of the template definition when they are provided. Another 
advantage of manual definition is the perfect precision that is possible 
therewith. The person carrying out the definition can label or annotate the 
placeholder as desired, for example by identifying the type of information that 
is intended to be inserted at that point. In short, placeholders can readily be 

25 identified by the person doing the definition and can be explicitly defined, and 
an advantage of having placeholders is that they make it easier to identify 
passing information as being an instance of a given template and a difference or 
. delta. The better defined fee placeholders the easier it becomes to evaluate the 
deltas between the template and the instance. In the automatic mode, the system 

30 identifies substantially identical sections (identical information units) in 
different documents, by direct comparison or according to an index or other 
indirect method, and decides that they should be defined as template candidates. 
Manual intervention may be needed to approve, classify, or fine-tune the 
selection. Automatic template definition is obviously easier from the user 
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perspective, and may often be more accurate, especially when access to the full 
text of both documents or versions is possible. In a preferred embodiment of the 
present invention, text matching and parsing is used in pinpointing the template. 
The introduction of these templates is useful to prevent false 
5 identification of differently classified documents that contain them, or when the 
templates themselves have different classification. 

In a preferred embodiment of the present invention, the system allows 
for declaring "ignored sections" - i.e., information objects and/or sets of 
elementary information units which the system can ignores while carrying out 
10 identification. 

In another preferred embodiment of the present invention, the system 
allows automatic classification of information according to a current domain of 
knowledge or to organizational departments (e.g., "legal", or "medical") 
utilizing keywords and clustering methods specific to the domain. Knowing 
15 about the specific domain allows a more sophisticated policy assignment. 

In a preferred embodiment of the present invention, the system allows 
for a default policy to be applied on information objects to which no previous 
policy has been defined. The policy is preferably also based on the domain of 
knowledge to which the said object belongs. 
20 The various unique identifiers of the information units are preferably 

stored in a database. The order in which the identifiers are presented in the 
original information object can also be stored, in order to allow for a better 
detection. 

In another preferred embodiment of the present invention, the policy can 
25 be defined in a manner that allows a given user to view only selected 
information objects that are parts of a current compound information object. 
Since policy can be assigned to each information object, and since the policy 
can specify who may view the particular information object, this feature can be 
readily implemented. In this case, the system preferably utilizes a mechanism 
30 that enables maintaining of the coherency of the text, taking into account 
linguistic considerations. 

In another preferred embodiment of the present invention, the number of 
the Identifiers of the elementary information units is reduced by performing 
random filtering, for example disregarding all the identifiers which end in "0 7 \ 
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or other filtering methods, in order to reduce computational and memory 

resources. Reducing the number of identifier may reduce the redundancy level 

and the robustness level, and an optimal reduction should therefore be derived 

as a trade-off between the allocated resources and the required robustness level. 

5 In another preferred embodiment of the present invention, identification 

may be based on a list of salient words and their respective distances, that are 

selected in a manner that assure that every portion of the text that is larger tlian 

a certain threshold (e.g., 10 words) contains at least one word from the list. The 

salient words are preferably not common and should convey distinctive power 

10 that enables identification of the information object. 

In another aspect of the present invention, the system allows authorized 

persons to override automatic decisions of the system, in order to handle cases 

of miss-identification or exceptional cases, preferably using a specialized utility 

activated from the user interface. In a preferred embodiment of the present 

15 invention, the system policy determines the scope of the system decisions that 

can be overridden by the various authorized persons in the organization. In a 

preferred embodiment of the present invention, an override operation may 

require a high level of authentication on behalf of the authorized person. 

In a preferred embodiment of the present invention, the system performs 

20 extensive logging of all the events and operations performed on selected 

information objects along the information lifecycle. The events and operations 

are preferably stored in a database, and in an embodiment the identifier of the 

information object is used as the main index. This allows a better understanding 

of the genealogy of the information object, and provides a useful tool for 

25 information management. One can understand the various phases in the 

development of the information object and gain an understanding of the entire 

process that produced the object, regardless of the various file formats and the 

software applications (e.g., word processors) used. 

In a preferred embodiment of the present invention the system enforces 

30 a policy regarding information objects with respect to edit actions, such as 

copying and cut & paste, in a manner that allows performing of such actions 

only within a single document. In a preferred embodiment of the present 

invention this is achieved by monitoring the information held in the clipboard 

on an individual computer in order to identify information objects. 
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Identification of the objects then leads to enforcement of the associated policy 
using a software client installed in the individual computer. 

In a preferred embodiment of the present invention the system imports 
information regarding the organizational structure from organizational 
5 documents such as organization charts. The imported information is used in 
order to determine the default policy: e.g.: 

• Definition of departments. 

• Definition of working-groups. 

• Hierarchies: e.g., if user X reports to user Y, then user Y has (at 
1 0 least) all the authorizations of user X. 

- In a preferred embodiment of the present invention, a graphical user 
interface (GUI) is used in order to facilitate the transformation of organizational 
structure and other organizational data into a policy regarding confidential 
information management. 
15 Reference is now made to Figs. 10a and 10b which are two diagrams 

that illustrate transformation of an organizational chart into a data-structure that 
allows a default policy to be defined. Fig. 10a illustrates a fictitious 
organizational chart, produced using " MS-Visio™" software, while Fig. 10b 
illustrates a data-structure in a MS-Excel format, derived automatically from the 
20 chart using the " MS-Visio™" software. From the data-structure a description 
of the members of the various departments and the organizational hierarchies 
can be derived, and the derivation can therefore support automatic definition of 
a default policy. A rule for defining a policy from such a data structure may be 
that if X reports to Y, then Y has at least all the permissions that X has ? or that if 
25 both X and Z reports to Y, then X and Z are allow to freely communicate 
between themselves providing that they include ("CC") Y in their 
correspondence. Such rules can be added to the organizational policies and 
procedures, thereby facilitating the rapid formation of distribution policy. 
Reference is now made to Fig. 1 1, which is a simplified flowchart 
30 illustrating a method that allows for document classification, according to a 

preferred embodiment of the present invention. The inputs, at stage A, indicated 
by 1110, consist of the following data: 

• The document or the text itself 
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• The required level of similarity. This level can be represented in 
terms of percentages (e.g., "80% ") or a more qualitative terms 
("high", "medium" and "low") 

• Maximum number of results 

5 • Other restrictions (e.g., creation and updating dates, formats, 

"containing specified text" etc.) 
The system thereafter extracts characteristics identified from the 
document, as explained above, in stage B 1120. In a preferred embodiment of 
the present invention, the characteristics are the numerical identifiers of the 
10 elementary information units, as explained above. The system then compares 
the characteristics with the identifiers database in stage C 1130, and then 
obtains the identities of documents with the required similarity level of their 
characteristics. In a preferred embodiment of the present invention, the 
similarity measure between document A and document B is based on the parts 
15 that are common between the two objects. For example, if document A is 
described by 400 elementary information units, and document B is described by 
500 elementary information units, out of which 300 elementary information 
units are common to A and B, then the similarity between A and B is 300/400= 
0.75, and the similarity between B and A is 300/500= 0.6. The ID may for 
20 example be a system ID number of the respective document. The Ids of 
documents having similar characteristics are obtained in stage 1140. The 
output, produced in stage E, indicated by 1150, consists of links and/or 
locations or paths to the matched documents, including the filename and 
preferably also a score indicating the level of matching. 
25 Reference is now made to Fig. 12, which is a simplified diagram that 

illustrates a system for document classification, according to preferred 
embodiment of the present invention. A user operates a user interface 1210 to 
provide relevant inputs 1212. These inputs may include: 

• The document or the text itself 

30 • The required level of similarity. This level can be represented in 

teims of percentages (e.g., "S0% ") or a more qualitative terms 
("high", "medium" and "low") 

• Maximum number of results 
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• Other restrictions (e.g., creation and updating dates, formats, 

"containing specified text" etc.) 
The characteristics identified are extracted from the document by the 
identifiers extractor 1220, as explained above. The identifiers are then compared 
5 with the identifiers in the identifiers database 1230, resulting in a record 1240, 
that contain the identities and locations of the documents with the required 
similarity. The links and/or locations of the documents with the required 
similarity, preferably with the matching score, are then presented to the user 
1214, within the user interface 1210. 
10 Reference is now made to Fig. 13, which is a simplified flowchart of a 

method for augmenting conditional access system by providing information- 
based clearance, according to a preferred embodiment of the present invention. 
The system scans the file system for identifiable information items in stage A, 
indicated by 1310. The system thereafter attempts to identify information items 
15 in stage B, indicated by 1320 and evaluate the policy attached to the identified 
information items in stage C 1330. The policy attached to the information items 
is then compared with the conditional access policy, stage D 1340, For example 
a certain user is authorized to view a certain file, but the file may contain an 
information item that the user is not authorized to read. Thus there is a breach 
20 in the information usage policy. In such a case, action may be taken, stage E, 
indicated by 1350 - e.g., removing the restricted information item from the 
unauthorized domain, moving restricted information item to an authorized 
domain, notifying the owner and other relevant entities, logging the event, etc. 

Reference is now made to Fig. 14, which is a simplified diagram that 
25 illustrates a system for augmenting conditional access by providing 
information-based clearance, according to a preferred embodiment of the 
present invention. Policy reference monitor 1410 instructs scanning and. 
identifying module 1420 to scan the storage 1430 {e.g., a file system). In a . 
preferred embodiment of the present invention, the scanning and identifying 
30 module contain is a file-system crawler. The .storage may, for example, contain 
a domain 1432 that is restricted to User group A and a domain 1434 that is 
restricted to user group B. The scanning and identifying module 1420 uses the 
identifier database 1440 in order to determine the identity of the analyzed 
information items. The identities of the analyzed information items, together 
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with their respective policies, are stored in the document identities and policies 
database 1450. The resolved identity is then sent to the policy reference monitor 
1410, which utilizes the document identities and policies database 1450 in order 
to resolve the corresponding policy. The policy reference monitor 1410 then 
5 instructs the policy enforcement module 1460 to apply the appropriate action, 
according to the resolved policy, on the file system 1430. 

Fig. 15 is a flowchart of a method for determining the integrity of an 
information object, according to a preferred embodiment of the present 
invention. The system obtains as an input an information object instance in 
10 stage A, 1510, and extracts and stores an electronic signature of the instance in 
stage B, 1520. The signature is evaluated such that any change in the instance 
completely destroys the signature, so for example in a case in which the 
instance is a file, the signature can be a cryptographic hash of a binary 
representation of the file. The instance is then subjected to a pre-processing 
15 stage C, 1530, in order to reveal its information content. The system then 
extracts and stores the information object signature in stage D, 1540, and 
monitors and inspects the digital traffic according to a pre-defined policy in 
order to detect integrity breaches in stage E, 1550. In the case of a breach, the 
system performs the action determined by the pre-defined policy — for example 
20 it blocks the transport, notifies the owner, alerts the administrator, places the 
message in quarantine etc in a stage E, 1550. 

Fig. 16 illustrates a system for assuring the integrity of an information 
object, according to a preferred embodiment of the present invention. When a 
user A 1602 access an information item in classified information storage 1604, 
25 the access inspection and control module 1 606 extract the integrity signature of 
the information item and send it to the reference monitor 1608. When User A 
1602 attempts to send the information item to user B 1610 or to the manager 
1616, the internal distribution inspection and control module 1612 and the 
integrity inspection module 1614 between them verify the integrity of the 
30 information item, and according to the policy dictated by the policy reference 
monitor 1608 determine whether to allow or to block the information 
transaction. Similarity, when either user B 1610 or the manager 1616 attempt to 
send the information item externally via firewall 1620 to the internet 1622 or 
via fax 1624, the external distribution inspection and control module 1618 
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verifies the integrity of the information item and according to the policy dictated 
by the policy reference monitor 1608 determines whether to allow or to block 
the information transaction. 

In a preferred embodiment of the present invention, the system 
5 interfaces with the organizational document management system. Document 
management systems are useful tools for managing confidential information, 
however, these systems manage files that are generally in a proprietary file 
format and enforce a policy only with respect to such pre-defined file formats. 
In a preferred embodiment of the present invention, the system obtains the 
10 definition of the policy from the document management system and 
supplements the document management system by enforcing policy with 
respect to information objects rather then files. In this case, the system extracts 
identifiers from the documents in the files and identifies and transfers 
corresponding policies from the document management system. The system 
15 then enforces the distribution policy defined by the document management 
system with respect to information object, regardless of their format. In 
another aspect of the present invention, the system scans for pre-designated 
information objects in storage devices, such as the user's hard disks, in order to 
locate unauthorized content stored by a user, utilizing client-side software. 
20 Preferably tamper resistant client-side software is used for that purpose. 

In a preferred embodiment of the present invention, the system utilizes 
client-side software in order to enforce a policy on the users. In a preferred 
embodiment of the present invention, the client is a tamper-resistant client, 
which uses a secure connection to a centralized database in which descriptors of 
25 information-objects, together with the corresponding policies, are stored. 
Methods for constructing such a tamper-resistant client are described, e.g., in 
US patent application 10/051,012, "A Method and a System for Securing 
Digital Video", filed January 22, 2002, and in US provisional patent application 
60/437,031, "A method and system for protecting confidential information", 
30 filed December 31, 2002, the contents of which are hereby incorporated by 
reference. 

The client preferably monitors policy-regulated activities such as 
editing, storing, particularly storing oh potentially mobile devices, sending, 
cppying segments or printing. The client reports its findings and may prevent 
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actions not allowed by the policy. The client may also monitor and report 
attempts to circumvent the system's protection. 

In a preferred embodiment of the present invention, the correspondence 
between information objects and policies and/or rules is induced by defining the 
5 policy and/or the rule as a property of information object. 

During the process of knowledge acquisition, some elementary 
information units are accumulated and clustered, while other elementary 
information units are subtracted. In order to be able to trace the information, a 
tracking channel is assigned to any information object. The tracking channel 
10 utilizes the notions of similarity (described above) and continuity in order to 
trace information evolution along the information lifecycle. The tracking 
channel can utilize any of many known methods, for example those taught in D. 
Hall, "Lectures in Multisensor Data Fusion and Target Tracking", Artech 
House; ISBN: 1580531407; Cd-Rom edition (March 2001), the contents of 
1 5 which are hereby incorporated by reference. 

In a preferred embodiment of the present invention, the default policy 
and other aspects of tracking, usage monitoring and policy enforcement are 
created and implemented taking into account at least one of the following 
criteria: 

20 • Information properties of the information object, for example 

language, representation, etc. 

• The operations carried put on the information object 

• The various users along the information life cycle. 

• The software applications used with respect to the information 
25 object. 

• The transmission channel. 

• The participant agents. 

• The virtual, logical and physical location of the computers. 

• Computer types (lap-top, desk-top, server, etc) 

30 In a preferred embodiment of the present invention, the system allows 

defining and enforcing of a policy that prevents sending or copying of specific 
information objects to a laptop computer and/dr portable media. 

In a preferred embodiment of the presept invention, the system allows 
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defining and enforcing of a policy that prevents sending or copying of specific 
information objects from a computerized device that is not located within the 
perimeter that is subject to monitoring and inspection. In a preferred 
embodiment of the present invention, this is done via a software client that 
5 resides on the computer and monitors information usage, as explained, e.g., in 
US provisional patent application 60/437,031 the contents of which are hereby 
incorporated by reference. 

In a preferred embodiment of the present invention, the system allows 
defining of a policy that prevents sending or copying of specific information 
10 objects unless they are encrypted to the system's satisfaction. 

In a preferred embodiment of the present invention, the system 
automatically encrypts specific information objects using a default key, as part 
of a default policy, and sends the encrypted content to the recipients. In a 
preferred embodiment of the present invention, the system disseminates specific 
15 information objects, as part of a default policy, using a secure channel such as 
TLS (Transport Level Security). 

In another aspect of the present embodiments, the system creates 
information about the relationships between different information collections 
rather than information about the information collected within the information 
20 collections, thus revealing information not contained in any of these collections. 
The relationship information may include history and workflow information, 
template related information and other information, and is sometimes referred to 
as meta information. 

This creation of information can be facilitated by comparing different 
25 documents, locating similarities and differences, this, especially combined with 
external information (such as file creation, modification and access date, 
relevant users, meta data contained in the files or in a file management system 
etc.), can be used to discern the workflow related to these documents, their 
relationship to each other (e.g. document B is based on documents A and C) and 
30 be used for reporting or other purposes. This kind of information is useful for 
document management, especially when combined with a document 
management system, for archival purposes, and for reference purposes. When 
document access and use is regulated by a policy, this information is useful for 
policy enforcement and definition. 
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In another aspect of the present invention a method and system for 
knowledge management and control are presented, based on modular and 
abstract descriptions of knowledge. In this case, the elementary' units are 
denoted as facts. As a first step in practicing the invention, elementary facts 
5 (EF) are defined. These elementary facts may be represented as sentences (e.g., 
"Mr. John Doe earn 65000$ in 2001"), as entries to a database etc. The system 
then assigns representation-independent identifiers and indices for each 
elementary fact. Knowledge objects, consist of one or more facts are thereafter 
defined by the system. For example, a set of facts about Mr. John Doe can be 
10 considered as a knowledge object. Within the context of this aspect of the 
present invention, a knowledge object is the basic ingredient on which a 
knowledge-oriented policy is defined. A simple knowledge object is a 
knowledge object that can be fully described as a set of elementary facts, while 
a compound knowledge object is a knowledge object that consists of two or 
15 more simple knowledge objects. A knowledge class is the set of all knowledge 
objects on which precisely the same policy is defined. An instance of a 
knowledge object is a specific representation of the knowledge object, e.g., a 
file of a database in a MS-Access™ format. Utilizing this terminology and 
nomenclature, a system substantially similar to the one described above can be 
20 used in order to provide confidential knowledge management. 

The information assets within an organization (e.g., financial 
information used for balance sheets) tend to evolve along a path, and undergo 
various stages of processing, validation, reviewing and assurance until the final 
product is produced. Maintaining the Confidentiality, Integrity & Availability 
25 (CIA) of the information along the process poses a non-trivial problem for most 
. organizations. 

In order to solve these problems, the organizations may consistently 
attempt to maintain an Intact Information Path, based on the following 
methodological steps: 

30 . . • Risk assessment: at a given stage, various potentially dangerous 

scenarios should be considered and their impact should be 
analyzed. The impact analysis should take into consideration 
both aspects of legislation and liability and direct and indirect 
damage to the organization from breach of the confidentiality, 
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integrity and availability of the information. 
Defining a well-formed information path for various types of 
information. Such a path requires all of the organizational 
information to be classified, and that for any type or class of 
information, information regarding ownership and path are 
carefully planned, defined and tested according to specific needs. 
The definition of the information path may include: 

■ Ownership: Selection and appointment of information 
owners and ownership hierarchy according to the owners' 
role in the organization. 

■ Storage and Availability: Definition of authorized storage 
devices and methods and their security policy and 
availability level. 

■ Access Policy & Control: definition of the access 
privileges of the various entities and the required 
identification, authentication and authorizations. This 
well known practice should be carefully employed in 
order to assure compliance with organizational needs and 
to verify that the access policy is enforced with respect to 
the information assets, regardless of their representations, 

. instantiations and formats. 

■ Usage and Processing: various entities that participate in 
the information path should be entitled, or required, to 
perform certain tasks, such as information possessing and 
filtering, integrity validation and assurance, final 

. approval etc. 

■ Distribution Policy: the distribution policy includes 
statements and rules regarding the authorized 
communication channels, authorized senders and 
recipients, the authorized formats, the required recipients, 
and any other restrictions and constrains with respect to 
any information item. 

■ Audit and Detection: There is preferably provided an 
audit program that covers all of the aspects of 
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information access, usage, manipulation and transition. 
The system preferably detect and records all the relevant 
parameters in order to provide a comprehensive audit and 
allows the reconstruction of the chain of events when 
needed. Detection rules for irregularities may be designed 
together with actions to take on events. 
■ Retention Policy : In order to limit the risk of information 
leakage while maintaining important information assets, a 
proper information retention policy may be defined. The 
retention policy may specify the minimal and/or the 
maximal time for which the information should be kept, 
the level of confidentiality that should be maintained 
during the various stages of the information lifecycle and 
possibly also the timeline for information disclosure. It is 
important that the retention policy is defined with respect 
to the information asset regardless of its instantiation: in 
many cases, after a specific file or document was deleted, 
other instances of the information item exists r- e.g., 
under a different filename and/or in a different file 
format, thereby exposing the organization to un- 
necessary legal liabilities and perils of unwelcome 
information disclosure. 
Maintaining the Intact Information Path: in order to prevent 
fraud the information flow within an organization should be 
continuously monitored and inspected, so that no covert channels 
are available and that the integrity and the accuracy of 
disseminated or disclosed information can be confirmed. 
Therefore, after the information path is defined, it should be 
constantly inspected and monitored in order to assure that: 

1 . No unauthorized party is added to the path. 

2. No entity that should be in the path is bypassed 

3. The integrity of information is preserved along the path: 
information items are changed, manipulated, added or 
deleted only by authorized entities and in an authorized 
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manner. 

In order to achieve the above goals, the information path may contain 
inspection & control points, in which the distribution policy is enforced and the 
integrity of the information is verified. 
5 In a preferred embodiment of the present invention the policy further 

includes a mandatory lifecycle. A mandatory lifecycle is a process that must be 
undergone in certain circumstances, e.g. when a certain user sends a certain type 
of information to a certain recipient, another predefined recipient (usually a 
supervisor or auditor) must also be a recipient (usually in order to prevent fraud, 
10 and to facilitate auditing). Another example is a certain order of events that 
must be enforced (e.g. the information can only be sent out after it was received 
by the legal department and after the legal department has submitted an 
approved copy, then the system ensures that only the approved copy can be sent 
out).The present embodiment thus address the shortcomings of the presently 
1 5 known configurations by providing a method and system for robust tracking and 
management of information and knowledge, which can efficiently serve digital 
information management, audit and control. 

It is appreciated that one or more steps of any of the methods described 
herein may be implemented in a different order than that shown, while not 
20 departing from the spirit and scope of the invention. 

While the present invention may or may not have been described with 
reference to specific hardware or software, the present invention has been 
described in a manner sufficient to enable persons having ordinary skill in the 
art to readily adapt commercially available hardware and software as may be 
25 needed to reduce any of the embodiments of the present invention to practice 
without undue experimentation and using conventional techniques. 

While the present invention has been described with reference to one or 
more specific embodiments, the description is intended to be illustrative of the 
invention as a whole and is not to. be construed as limiting the invention to the 
30 embodiments shown. It is appreciated that various modifications may occur to 
those skilled in the art that, while not specifically shown hereiii, are nevertheless 
within title true spirit and scope of the invention. 

Although the invention has been described in conjunction with specific 
embodiments thereof, it is evident that many alternatives, modifications and 

.48 



WO 2004/040464 PCT/IL2003/000889 

variations will be apparent to those skilled in the art. Accordingly, it is intended 
to embrace all such alternatives, modifications and variations that fall within the 
spirit and broad scope of the appended claims. All publications, patents and 
patent applications mentioned in this specification are herein incorporated in 

5 their entirety by reference into the specification, to the same extent as if each 
individual publication, patent, or patent application was specifically and 
individually indicated to be incorporated herein by reference. In addition, 
citation or identification of any reference in this application shall not be 
construed as an admission that such reference is available as prior art to the 

1 0 present invention. 
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Claims: 



1 . A method for monitoring information content carried in a 
medium, the method comprising: 
5 monitoring said medium for said information; 

seeking elementary information units within objects of said information 
being monitored in said medium; 

identifying said elementary information units; and 
deducing information about the content of said information objects from 
10 identification of said elementary information units found within said objects. 



2. A method according to claim 1 , wherein said medium comprises 
at least one of the following: 

a distribution channel; and 
15 a storage medium. 

3 . A method according to claim 1 , wherein said information 
objects comprise at least one simple information object, said simple information 
object comprising one of the following: 

20 an elementary information unit; 

a set of elementary information units; and 
an ordered set of elementary information units, 

4. A method according to claim 1 , wherein said elementary 
information units comprise at least one of the following: 

a sentence; a sequences of words; a word; a sequence of characters; a 
character; a sequence of numbers; a number; a sequence of digits; a digit; a 
vector: a curve: a pixel; a block of pixels; an audio frame; a musical note; a 
musical bar; a visual object; a sequence of video frames; a sequence of musical 
notes; a sequence of musical bars; and a video frame. 

5. A method according to claim 1 , further comprising assigning 
elementary information units identifiers to elementary information units after 
identification. 

REPLACED B v 
51 ART H mUT 



25 



30 



flec'd POTPTO 0 2 MAY 
ft # 

WO 2004/040464 ^^PCT/IL2003/000889 



6. A method according to claim 5, wherein said elementary 
information unit identifiers are utilized in said deducing. 

5 

7. A method according to claim 1 , wherein said information object 
identification is carried out on an instance of said information object, said 
information object instance being said information object in a specific format. 

10 8. A method according to claim 7, wherein said format comprise at 

least one of the following: 

jpeg image; gif image; Word document format; Lotus notes format; 

mpeg format; text format; rich text format; Unicode text format; multi byte text 

encoding format; formatted text format; ASCII text format; HTML; XML; 
15 PDF; postscript; MS-Excel spreadsheet; MS-Excel drawing; MS-Visio drawing; 

Photoshop drawing; AutoCAD drawing format; and CAD drawing format. 

9. A method according to claim 5, wherein said elementary 
information unit identifiers are determined by the content of said elementary 

20 information units whieh they are assigned to. 

10. A method according to claim 9, wherein said elementary 
information unit identifiers are solely determined by said content. 

25 11. A method according to claim 5, wherein said elementary 

information units identifiers are at least partly determined by locations within an 
information object of respective elementary information units to which they are 
assigned. 

30 12. A method according to claim 5, wherein said elementary 

information units identifiers are at least partly determined by the content of an 
elementary information unit in proximity to said elementary information units 
•to which they are assigned. 
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13. A method according to claim 5, comprising storing said 
elementary information units identifiers in a database. 

14. A method according to claim 13, further comprising using said 
5 elementary information units identifiers stored in said database for identifying 

at least one further, unidentified, information object. 

15. A method according to claim 13, further comprising using said 
elementary information units identifiers stored in said database for comparing 

1 0 information objects. 

16. A method according to claim 5, comprising storing only some of 
said elementary information units identifiers in a database. 

15 17. A method according to claim 1 6, wherein said storing of only 

some of said elementary information units identifiers in a database is to achieve 
at least one of the following: 
reduce storage cost; 

increase efficiency of assigning of said elementary information units 
20 identifiers to said elementary information units by only performing said 

assignment for elementary information units identifiers that are stored in said 
database; and 

increase the efficiency of searching for said elementary information 
units identifiers in said database. 

25 

1 8. A method according to claim 1 6, wherein said storage of only 
some of said elementary information units identifiers in a database is done in a 
manner that ensures that any area of a given size in said information object 
contains a predetermined minimum number of said stored elementary 

30 information units. 

19. A method according to claim 1 8, wherein said given size is 
dependent on properties of a respective information object. 
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20. A method according to claim 19, wherein said properties of said 
information object comprise at least one of the following: 

importance; size; confidentiality level; and format 

21. A method according to claim 1 8, wherein said minimum number 
is dependent on properties of said information object. 

22. A method according to claim 21 , wherein said properties of said 
information object comprise at least one of the following: 

importance; size; confidentiality level; and format 

23. A method according to claim 3, wherein said information objects 
comprise at least one compound information object, said compound information 
object comprising at least one of the following: 

a simple information object; a compound information object; an ordered 
set of compound information objects; an ordered set of simple information 
objects; and an ordered set of compound and simple information objects. 

24. A method according to claim 1, wherein said information 
comprises at least one of the following: 

numeric data; spreadsheet data; numeric spreadsheet data; textual 
spreadsheet data; word processor data; textual data; hyper text data; audio data; 
visual data; multimedia data; binary data; raw data; database data; video data; 
drawing data; chart data; picture data; and image data. 

25. A method according to claim 1 , wherein monitoring is done in at 
least one of the following: 

Firewall; Web server; Web proxy; HTTP proxy; HTTP server; SMTP 
gateway; SMTP server; 

Fax server; SOCKS proxy; Sniffer; Server; WAN gateway; proxy; 
Router; Mail server; file server; client; file system; gateway; router; application; 
operating system; database; database accessing utility; database accessing 
server; Internal mail server; External mail server; Message board; NNTP server; 
and an IRC server. 
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26. A method according to claim 1 , wherein monitoring is carried 
out on at least one of the following traffic types: 

Instant messaging; IP; HTTP; Mail; TCP; UDP; Web; Streaming; Chat; 
5 IRC; computer network; LAN; WAN; VPN; POP3; MAPI; FTP; NNTP; File 
transfer; IMAP; SMTP; and Fax. 

27. A method according to claim 1, wherein monitoring is done by at 
least one of the following: 

10 buffering; caching; forwarding; sniffing; and relaying. 

28. A method according to claim 1, wherein monitoring comprises 
at least one of the following: 

blocking traffic; altering traffic; and altering traffic such as to invalidate 
15 said traffic. 

29. A method according to claim 1 , comprising carrying out said 
monitoring at a proxy. 

20 30. A method according to claim 29, comprising routing traffic to be 

monitored to said proxy. 



31. A method according to claim 29, comprising blocking any traffic 
requiring monitoring which manages to bypass said proxy. 



25 



32. A method according to claim 3 1 , comprising using a firewall to 
carry out said blocking. 



33. 



A method according to claim 29, wherein said proxy is a SOCKS 



30 



proxy. 



34. 



A method according to claim 29, wherein said proxy is . an HTTP 



proxy. 
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35. A method according to claim 1, comprising monitoring instant 
messaging traffic. 

36. A method according to claim 35, comprising monitoring file 
distribution controlled by said instant messaging traffic. 

37. A method according to claim 36, comprising altering said instant 
messaging traffic controlling said file distribution, thereby to facilitate capturing 
said file distribution. 

38. A method according to claim 1, comprising using said deducing 
to attach to said information object an information object policy, said policy 
comprising at least one of the following: 

an allowed distribution of said information object; 
a restriction on distribution of said information object; 
an allowed storage of said information object; 
a restriction on storage of said information object; 
an action to be taken as a reaction to an event; 
an allowed usage of said information object; and 
a restriction on usage of said information object. 

39. A method according to claim 38, wherein said information 
object policy comprises at least one action to be taken as a reaction to an event, 
and wherein said action comprises at least one of the following: 

preventing distribution of said information object; preventing storage of 
said information object; preventing usage of said information object; reporting 
distribution of said information object; reporting storage of said information 
object; reporting usage of said information object: reporting; alerting about 
distribution of said information object; alerting storage of said information 
object; alerting usage of said information object; alerting; logging distribution 
of said information object; logging storage of said information obj ect; logging 
usage of said information object; logging; notifying about distribution of said . 
information object; notifying about storage of said information object; notifying 
about usage of said information object; notifying; notifying to an administrator; 
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notifying to a manager; notifying to a recipient; notifying to a sender; notifying 
to an owner of said information object; quarantine; alerting an administrator; 
alerting a manager; alerting a recipient; alerting a sender; alerting an owner of 
said information object; reporting to an administrator; reporting to a manager; 
5 reporting to a recipient; reporting to a sender; reporting to an owner of said 
information object; encrypting said information object; changing said 
information object; replacing said information object; and utilizing digital 
rights management technology on said information object. 

10 40. A method according to claim 38, wherein said information object 

policy comprises at least one action to be taken as a reaction to an event, and 

wherein said event comprises at least one of the following: 

attempted distribution of said information object; attempted storage of 
said information object; 
1 5 attempted usage of said information object; distribution of said 

information object; storage of said information object; and usage of said 
information object. 

41 . A method according to claim 38, wherein said information object 
20 usage comprises at least one of the following: 

copying an excerpt; editing; copying to clipboard; copying an excerpt to 
•clipboard; changing format; changing encoding; encryption; decryption; 
-changing digital management; opening.by an application; and printing. 

25 42. A method according to claim 38, wherein said information object 

policy comprises placing a substantially imperceptible marking in said 
information object, said marking comprising information content, and said 
method comprising placing said marking, when indicated by said policy, before 
allowing at least one of the following: 

30 storage of said information object; usage of said information object; and 

distribution of said information object. 
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43. A method according to claim 42, wherein said information 
content for storage in said marking comprises at least one of the following: 
the identity of said information object; 

the identity of a user performing the action in respect to said information 

5 object; 

the identity of a user authorizing the action in respect to said information 

object; 

the identity of a user overriding policy and approving the action in 
respect to said information object; and 
10 the identity of a user requesting the action in respect to said information 

object 

44. A method according to claim 38, wherein said information object 
policy further comprises changing said information object by at least one of the 
15 following: 

deleting part of said information object; replacing part of said 
information object; and inserting an additional part to said information object 
before allowing at least one of the following actions: 

storage of said information object; usage of said information object; and 
20 distribution of said information object. 

45. A method according to claim 44, wherein said changing of said 
information object is done in order to eliminate parts having policies that do not 
allow for said action to be executed while they are in the document. 

25 

46. ' A method according to claim 44, wherein said changing of said 
information object is carried out in order to personalize said information object. 

47. A method according to claim 44, wherein said changing of said 
30 information object is carried out in order to customize said information object 

for a specific use. 

48. A method according to claim 44, wherein said changing of said 
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infonnation object is done in a manner selected to achieve at least one of the 
following: 

preserving the coherency of said information object; seamlessness; 
preserve the structure of said information object; preserving the linguistic 
coherency of said information object; preserving the formatting style of said 
information object; and preserve the pagination style of said information object. 



49. A method according to claim 44, wherein said information 
objects comprise compound information objects and wherein said changing of 

1 0 said information object is made to constituent parts of a compound information 
object. 

50. A method according to claim 38, wherein said storing comprises 
storage in at least one of the following: 

15 a portable media device; a floppy disk; a hard drive; a portable hard 

drive; a flash card; a flash device; disk on key; magnetic tape; magnetic media; 
optic media; punched cards; a machine readable media; a CD; a DVD; a 
firewire device; a USB device; and a hand held computer. 

51. A method according to claim 3 8, wherein said policy comprises 
distribution regulation, said distribution regulation being for regulating at least 
one of the following: 

sending said information object via mail; 
sending said information object via web mail; 
uploading said information object to a web server; 
uploading said information object to a FTP server; 
sending said information object via a file transfer application; 
sending said information object via an instant messaging application; 
sending said information object via a file transfer protocol; and 
sending said information object via an instant messaging protocol. 

52. A method according to claim 38, wherein said policy is 
dependent on at least one of the following: 
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the domain of a respective information object; the identity of a system; 
the identity of a user; the identity level of a user authorizing an action; the 
identity of a user requesting an action; the identity of a user involved in an 
action; the identity of a user receiving an information object; the authentication 
5 level of a system; the authentication level of a user; the authentication level of a 
user requesting an action; the authentication level of a user authorizing an 
action; the authentication level of a user involved in an action; the 
authentication level of a user receiving said information object; the 
authentication level of a user sending said information object; the format of an 
1 0 information object instance; an interface being used; an application being used; 
encryption being used; digital rights management technology being used; 
detection of transformation, wherein said transformation is operable to reduce 
the ability to identify said transformed information object; information object 
integrity; regular usage pattern; regular distribution pattern; regular storage 
15 pattern; information path; consistency of an action with usage pattern; the 

identity of a user overriding policy and authorizing the action in respect to said 
information object; the authentication level of a user overriding policy and 
authorizing the action in respect to said information object; the identity of a user 
sending information object; information property of said information object; 
20 language of said information object; representation of said information object; 
operations done on of said information object; identity of users involved along 
the life cycle of said information object; application used on of said information 
object; transition channel of said information object; participant agents; virtual 
location of a computer; logical location of a computer; physical location of a 
25. computer; type of a computer; type of a laptop computer; type of a desktop 
. computer; type of a server computer; and owner identity. 

53. A method according to claim 38, further comprising enabling at 
least one user to override at least one of decisions contained within said policy. 



30 



54. A method according to claim 1, wherein said deducing 
comprises utilizing conditional probabilities for at least one of the following: 
identification of information objects; 
classification of information objects; and 
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identification of a knowledge domain of information objects. 

55. A method according to claim 38, wherein at least part of said 
policy is stored in a database. 

5 

56. A method according to claim 1 , wherein said deducing further 
comprising utilizing keywords for at least one of the following: 

identification of information objects; identification of elementary 
information units; classification of information objects; and identification of the 
1 0 domain of information objects. 

57. A method according to claim 56, wherein said keywords are 
stored in a database. 

1 5 1 58. A method according to claim 56, wherein said keywords are 

stored in at least one of the following forms: 

hash value; raw string; and numeric representation. 

59. A method according to claim 38, wherein at least part of said 
20 policy is defined in terms of a logic expression. 

60. A method according to claim 59, wherein said expression is 
evaluated by lazy evaluation. 

25 61 . A method according to claim 59, wherein at least some of the 

variables in said logic expression comprise of at least on of the following" 

an external function; an external function based on group membership; 
and an external variable. 

30 62. 

property of said information object, a property of a user, a property of a 
computer, a property of an entity, and a hierarchy of calculations. 
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63. A method according to claim 38, wherein at least part of said 
policy is defined in terms of a role, wherein said role consists of a property of at 
least one of a user and a system and wherein said role further comprises at least 
one authorization. 

64. A method according to claim 38, wherein at least part of said 
policy is defined in terms of at least one of the following languages: 

a scripting language; an ordered calculation language; a programming 
language; an interpreted language; and a functional language. 

65. A method according to claim 64, wherein said at least one of said 
following languages comprises instructions for the operation of an ordered 
calculation resulting in at least one of the following: 

policy; instruction to perform an action; restriction; and allowance. 

66. A method according to claim 38, wherein said information object 
is a compound information object comprising constituent simple information 
objects, and a respective policy assigned to said information object comprises 
different policies for at least some of said constituent information objects. 

20 

67. A method according to claim 1 , wherein at least one user is 
defined in an owner definition as an owner of said information object. 

68. A method according to claim 67, wherein said owner definition 
25 is stored in a database. 

69. A method according to claim 1 , wherein said deducing further 
comprises utilizing organizational structure information. 

30 70. A method according to claim 69, wherein said organizational 

structure information comprise at least one of the following: 

user superiority; working groups; organizational hierarchy; departmental 

separation; and membership in working groups. 
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71 . A method according to claim 38, comprising using 
organizational structure information in order to assign a respective policy 
object. 

5 72. A method according to claim 69, wherein at least part of said 

organizational structure information is stored in a database. 

73 . A method according to claim 69, wherein at least part of said 
organizational structure information is used for information object 

10 classification. 

74. A method according to claim 69, wherein at least part of said 
organizational structure information is imported from at least one of the 
following: 

1 5 organizational data system; data management system; organizational 

data management system; 

knowledge management system; user directory; LDAP server; 
document; and an organizational chart. 

20 75. A method according to claim 1, further comprising making use 

of at least one user interface operable to assist in at least one of the following: 

classification; policy definition; template definition; approving and 
revising automatic template definition; importing organizational structure 
information; revising organizational structure information; produce reports; 

25 overriding policy decisions; and providing authorizations. 

76. A method according to claim 38, comprising defining an 
information class as a group consisting of at least two information objects, said 
defining further comprising associating with said information class a 

30 corresponding class policy being a policy shared by said information objects. 

77. A method according to claim 76, Wherein said information class 
policy comprises at least a part of respective policies of said information objects 
within said class. 
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78. A method according to claim 1 , further comprising using 
template information objects to represent commonly repeated information, such 
that a template information object together with a difference information object 

5 representing instance specific information are together formable to produce a 
compound information object in which common and specific information are 
respectively identifiable. 

79. A method according to claim 78, comprising using said template 
10 information object in identifying any of unknown information object 

comprising information corresponding to said template information object. 



80. A method according to claim 78, wherein said template 
information object is a compound information object, wherein said template 
1 5 information object comprises at least one placeholder, and wherein said method 
comprises replacing said placeholder by at least part of said difference 
information object when said difference information object and a respective 
template information object are combined. 

20 81. A method according to claim 80, wherein at least one of said 

placeholders is a specialized placeholder, said specialized placeholder 
comprising specialization information to identify a respective specialization of 
said specialized placeholder. 

25 82. A method according to claim 80, wherein at least one of said 

placeholders is a specialized placeholder, said specialized placeholder 
comprising a restriction about information objects permitted for replacing said 
specialized placeholder. 

30 83 . A method according to claim 78, wherein said template 

information object comprises at least one of a group comprising: a disclaimer; a 
form; a header; a footer; a contract; and an invoice. 
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84. A method according to claim 8 1 , wherein said specialized 
placeholder comprises a restriction about information objects permitted for 
replacing said specialized placeholder, and wherein said restriction cofnprises a 
rule for excluding at least one of the following: 
5 an object comprising numeric information; 

an object comprising a word; 

an object comprising a character; 

an object comprising a digit; 

an object comprising a sentence; and 
10 an object comprising a simple information object. 

85. A method according to claim 78, comprising defining a template 
information object and wherein said defining comprises automatically 
identifying a template information object candidate. 

15 

86. A method according to claim 85, wherein said automatically 
identifying a template information object candidate comprises identification of 
shared elementary information units of at least two information objects. 

20 87. A method according to claim 85, wherein said step of 

automatically identifying a template information object candidate comprises 
identification of substantially similar information objects. 

88. A metho#according to claim 85, wherein said step of 

25 automatically identifying a template information object candidate comprises the 
use of at least one of text parsing; and text matching. 

89. A method according to claim 78, comprising deriving at least a 
part of a respective information object policy associated with a template 

30 instance information object from an information object policy of the respective 
originating template information object. 

90. A method according to claim 38, wherein at-least a part of an 
information object policy of a respective information object is derived from a 
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default information object policy when said part of said information object 
policy of said information object is not explicitly defined. 

91. A method according to claim 5, comprising applying 

5 preprocessing to said elementary information units before assigning identifiers 
thereto. 

92. A method according to claim 9 1 , wherein said preprocessing is 
done in order to enhance at least one of efficiency and robustness. 



93 . A method according to claim 9 1 , wherein said preprocessing 
comprises at least one of canonization; removal of common words; removal of 
words not having a substantial effect on the meaning of the text; removal of 
punctuation; correction of spelling; canonization of spelling; scene detection; 



reducing noise; enhancing area separation; enhancing borders; enhancing lines; 
sharpening; blurring; removal of elementary information units substantially 
similar to neighboring elementary information units; canonization of grammar, 
and transformation to a phonetic representation. 



94. A method according to claim 9 1 , comprising carrying out said 
preprocessing so as to ensure that any area of a given size in said information 
object contains at least a predetermined number of said elementary information 
units having an assigned elementary information unit identifier. 



95. A method according to claim 94, wherein said given size is 
dependent on properties of said information object. 

96. A method according to claim 95, wherein said properties of said 
30 information object comprise at least one of a group comprising: importance; 

size; confidentiality level; and format. 

97. A method according to claim 94, wherein said predetermined 
number is dependent on properties of said information pbject. 



10 



15 



canonizing size; canonizing orientation; canonizing color; removing color; 



20 



25 
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98. A method according to claim 97, wherein said properties of said 
information object comprise at least one of a group comprising: importance; 
size; confidentiality level; and format. 

99. A method according to claim 1 ? further comprising a stage of 
detection of information objects having undergone transformations. 



10 



100. 



101. A method according to claim 99, wherein said detection of 
information objects that have undergone transformation comprises detection of 
at least one of a group comprising: 

transformation artifacts; spelling mistakes; wrong grammar; wrong 
15 punctuation; wrong capitalization; missing punctuation; missing capitalization; 
irregular word distribution; lack of common words; predominance of unknown 
words; inconsistent headers; headers inconsistent with file type; headers 
inconsistent with file content; file type inconsistent with file content; irregular 
distribution of characters; irregular distribution of words; irregular distribution 
20 of character sequences; irregular distribution of word sequences; irregular 

length of words; irregular length of sentences; irregular distribution of length of 
words; irregular distribution of length of sentences; irregular file format; 
irregular file encoding; unknown file format; unknown file encoding; mix of 
non-alphabetic characters; unopenable file; action time; information object 
25 creation time; information object update time; encryption; and an unexpectedly 
high level of entropy. 

1 02. A method according to claim 5, comprising formulating 
respective assigned elementary information unit identifier to be resilient to 

30 small errors. 

1 03. A method according to claim 5, wherein said assigning of 
. elementary information unit identifier utilizes image matching. 
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1 04. A method according to claim 5, wherein said assigning of 
elementary information unit identifier comprises a mapping to a Euclidian 
space. 

1 05. A method according to claim 104, wherein said mapping to a 
Euclidian space comprises approximating a pairwise difference between 
elementary information units. 

106. A method according to claim 105, wherein said approximating is 
such that a difference between two elementary information units approximates 
said pairwise difference between said two elementary information units. 

107. A method according to claim 105, wherein said approximation of 
said pairwise difference between elementary information units comprises an 
approximation of at least one of the following: 

semantic difference; distance measured by image matching; phonetic 
difference; and spelling difference. 

108. A method according to claim 1, wherein said information object 
is a knowledge object. 

1 09. A method according to claim 1 , wherein said elementary 
information unit is an elementary fact. 

110. A method according to claim 1 09, wherein said elementary fact, 
comprises at least one of the following: 



knowledge. 

111. A method according to claim 76, wherein said information class 
is a knowledge class. 



sentence; database entry; representation independent description of 
knowledge; modular description of knowledge; and abstract description of 
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112. A method according to claim 1 , further comprising a stage of 
discerning lifecycle information about a respective information object. 

1 13. A method according to claim 1 12, wherein said discerning of 
information about the lifecycle of said information object comprises utilizing 
information about sharing of at least one elementary information unit in said 
information o"bject, wherein said elementary information unit is shared with at 
least one additional information object. 

1 14. A method according to claim 112, wherein said discerning of 
information about the lifecycle of said information object is based on at least 
one of a group comprising: file system date information; information about 
editing of said information object; and information about registration of said 
information object. 

115. A method according to claim 1 1 2, comprising utilizing said 
information about the lifecycle of said information object for the creation of a 
lifecycle graph. 

20 116. A method according to claim 112, comprising utilizing said 

information about the lifecycle of said information object to define at least part 
of the policy of said information object said utilizing comprising identifying at 
least one other information object along said information object's lifecycle and 
examining a policy associated therewith. 



15 



25 



30 



117. A method according to claim 5, wherein said assigning of said 
elementary information unit identifier is carried out a plurality of times, each 
time utilizing a different method for assigning of an elementary information unit 
identifier. 

118. A method according to claim 1 17, wherein said assigning of 
elementary information unit identifier several times comprises storing of 
elementary information unit identifier assigned utilizing different methods are 
stored separatelv. 
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119. A method according to claim 1 1 7, wherein said assigning of said 
elementary information unit identifier several times comprises storing of 
elementary information unit identifier assigned utilizing different methods can 

5 be distinguished according to said method utilized to assign them. 

1 20. A method according to claim 117, wherein said different 
methods are selected such as to optimize between at least any two of the 
following: 

10 storage space; search speed; capability to detect transformation; 

capability to detect a specific transformation; resilience to transformation; 
resolution of identification from among similar information objects; resolution 
of identification of boundaries within compound information objects; resilience 
to a specific transformation; and resilience to transformation. 

15 

121. A method according to claim 5 9 wherein said assigning of a 
respective elementary information unit identifier comprises utilizing a method 
having at least one of the following characteristics: 

order sensitive to data in the elementary information unit; order 
20 insensitive in the elementary information unit; utilizing changing definitions of 
the elementary information unit such that said assigning of said elementary 
information unit identifier is carried out a plurality of times using a plurality of 
definitions; utilizing an exchangeable method of preprocessing, such that said 
assigning of said elementary information unit identifier is carried out several 
25 times; being omission resilient; being insertion resilient; being replacement 

resilient; being dictionary based; being distribution based; being locality based; 
being histogram based; and being n-gram based. 

122. A method according to claim 38, wherein an information object 
30 policy comprises at least some information about one or more methods utilized 
for assigning of an elementary information unit identifier to a respective 
information object. 
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123. A method according to claim 1 17, wherein said assigning 
utilizing different methods comprises utilizing said different methods 
sequentially until a predetermined stop condition is reached. 

5 124. A method according to claim 5, wherein said information object 

comprises spreadsheet data, and wherein said assigning of said elementary 
information unit identifier assigned to said information object comprises 
utilizing a method comprising at least one of the following characteristics: 

invariance to linear transformation; invariance to reordering; invariance 
1 0 to permutation; resilience to linear transformation; resilience to reordering; 
resilience to permutation; resilience to minor changes; resilience to cuts; 
utilizing of statistic moment; utilizing of statistic moment for a table; utilizing 
statistic moment for a row; utilizing statistic moment for a column; and utilizing 
a mathematical descriptor of the information object data. 

15 

125. A method according to claim 5, comprising utilizing said 
elementary information unit identifiers for said information object identification 
using a technique having at least one of the following characteristics: omission 
resilience; insertion resilience; replacement resilience; being dictionary based; 
20 being distribution based; being locality based; being based on the size of 

elementary information units; being based on the size of information objects; 
resilience to linear transformation; resilience to reordering; resilience to 
permutation; resilience to minor changes; resilience to cuts; being histogram 
based; and being n-gram based. 

25 

1 26. A method according to claim 1 , further comprising utilizing a 

client. 

127. A method according to claim 126, wherein said client comprises 
30 at least one of the following: 

end point software; end point hardware; tamper resistant software; 
tamper resistant hardware; 

client side software; and client side hardware. 
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128. A method according to claim 126, comprising utilizing said 
client for at least one of the following: 

monitoring of client side storage; monitoring of client side access; 
monitoring of client side usage; 
5 monitoring of client side distribution; 

monitoring of copying of information object excerpts; 

monitoring of clipboard; 

monitoring of at least one application; 

monitoring of at least one interface; 
1 0 control of at least one application; 

control of at least one interface; 

control of clipboard; 

control of copying of information object excerpts; 
control of client side storage; 
15 control of client side access; 

control of client side usage; and 
control of client side distribution. 

129. A method according to claim 1, comprising utilizing comparing 
20 of at least two information objects to calculate pairwise similarity between 

objects. 

130. A method according to claim 129, comprising utilizing said 
pairwise similarity to map said information objects to a space. 



25. 



30 



131. A method according to claim 130, wherein said space is an 
Euclidian space, and wherein the closeness between any two objects within said 
Euclidian space is approximately proportional to said pairwise similarity 
between said information objects. 

1 32. A method according to claim J 30, wherein said space is a 
weighted graph, and wherein the weight of an edge between any two objects 
within said graph space is approximately proportional to said pairwise similarity 
between said information objects. 
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133. A method according to claim 130, wherein said space is a graph, 
and wherein the existence of an edge between any two objects within said graph 
space is dependent on said pairwise similarity between said information objects. 

5 

134. A method according to claim 130, wherein said space is utilized 
to identify at least one similarity information class, wherein said information 
class consists of at least two information objects, wherein said information class 
policy is a policy shared by the information class, and wherein said similarity 

10 information class is bounded within said space. 

135. A method according to claim 130, comprising utilizing said 
space to identify at last one information object substantially similar to an 
unidentified information object. 

15 

136. A method according to claim 130, comprising using said space to 
identify at least one other information object substantially similar to an 
information object for which policy is not known, thereby to obtain a policy 
associated with said other information object to use as basis for a policy for said 

20 information obj ect. 

137. A method according to claim 1 , comprising storing information 
about said information object in a database. 

. 25 138. A method according to claim 1, further comprising extracting a 

descriptor of said information object, based on statistical analysis of said 
information obj ect. 

1 39. A method according to claim 1 , comprising storing the order of 
30 said elementary information units within said information object in a database. 

140. A method according to claim 1 39, comprising using said order 
for identification of said information object. 
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141. A method according to claim 1 , further comprising interfacing at 
least one of an information management system; and a document management 
system. 

5 1 42. A method according to claim 1 , further comprising tracking at 

least one of the following: 

usage patterns; storage patterns; and distribution patterns, 

143. A method according to claim 142, wherein said tracking is 
1 0 carried out to infer information about at least one of the following: 

normal usage patterns; normal storage patterns; normal distribution 
patterns; irregular usage patterns; irregular storage patterns; and irregular 
distribution patterns. 

1 5 144. A method according to claim 143, wherein said inferred 

information is used to define at least part of a policy. 

145. A method according to claim 143, comprising using said inferred 
information for information object classification. 



20 



146. A method according to claim 1, further comprising logging. 



147. A method according to claim 146, wherein said logging 
comprising logging of at least one of the following: 

25 actions; events; and information objects identification. 

148. A method according to claim 146, wherein at least part of said 
logging is controlled by a policy. 

149. A method according to claim 146, wherein at least part of said 
30 logging is stored in a database. 

1 50. A method according to claim 146, comprising utilizing said 
logging to augment lifecycle information for said information object. 
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151. A method according to claim 1 , further comprising assessing the 
integrity of at least one information object, wherein said integrity assessment 
consists of comparing said information object with a version of said information 
object for which integrity is assured. 

5 

152. A method according to claim 151, further comprising issuing a 
certificate of said integrity for at least one information object. 

153. A method according to claim 152, wherein said certificate is a 
1 0 cryptographic certificate. 

154. A method according to claim 151, further comprising replacing 
said information object with said version of said information object for which 
said integrity is assured. 

15 

155. A method according to claim 151, comprising identifying when 
said integrity of said information object is not satisfactory, and in such a case 
not allowing distribution of said information object. 

20 1 56. A method according to claim 151, comprising identifying when 

said integrity of said information object is not satisfactory, and in such a case 
not allowing storage of said information object. 

157. A method according to claim 151, comprising identifying when 
25 said integrity of said information object is not satisfactory, and in such a case 

not allowing usage of said information obj ect . 

158. A method according to claim 1 , further comprising defining at 
least one constituent information object to be an ignored information object, and 

30 wherein, whenever said to be ignored information object is an element of a 

compound information, ignoring said object in identification of said compound 
information obj ect. 



75 REPLACED BV 
ART H AMDT 



f^t-tipcrmo 02 MAY 

WO 2004/040464 CT/IL2003/000889 
1 59. A method according to claim 38, further comprising changing 
access control information in accordance with said policy. 



1 60. A method according to claim 1 , further comprising not allowing 
5 usage of respective ones of said information objects outside an organization. 

161. A method according to claim 1 , further comprising not allowing 
storage of respective ones of said information object outside an organization. 

10 1 62. A method according to claim 1 , further comprising not allowing 

distribution of respective ones of said information object outside an 
organization. 

163. A method according to claim 40, wherein said policy comprises 
15 at least one mandatory lifecycle. 

164. A method according to claim 163. wherein said action is , 
dependent on the matching of said mandatory lifecycle with a lifecycle of a 
respective event. 

20 

1 65. A method according to claim 163, wherein said mandatory 
lifecycle comprises at least one mandatory recipient of said information object; 
and an order of events concerning said information object. 

25 1 66. A method according to claim 44, wherein said inserting an 

additional part to said information object comprises inserting at least one of the 
following: a header; a footer; and a disclaimer. 

1 67. A method according to claim 38 or claim 52, comprising 
30 defining areas and wherein said policy is dependent oh whether an action is 
taken inside a user-defined area. 
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1 68. A method according to claim 38 or claim 52, comprising 
defining areas and wherein said policy is dependent on whether an event occurs 
inside a user-defined area. 

5 1 69. A method according to claim 1 or claim 5, comprising using said 

deducing to locate at least one information object with similar content to a given 
information object. 

170. A method according to claim 38, comprising attaching a 
10 respective policy to information objects according to their logical location 

within an information storage medium. 

171. A method according to claim 1 70, further comprising utilizing a 
crawler for automatic location of information objects. 



15 



172. A method according to claim 171, wherein said information 
storage medium is a file system. 



173. A method according to claim 169, wherein said locating is done 
20 in an information storage medium. 



1 74. A method according to claim 1 73, further comprising utilizing a 
crawler for automatic location of information objects within said information 

25 storage medium. 

175. A method according to claim 1 73 , wherein said information 
storage medium comprises at least one file system. 

30 1 76. A method for information identification comprising: 

Finding elementary information units within said information object; 

and 

Deducing information about the identity of said information object from 
identification of said elementary information units found within said 
35 information object. 
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177. A method according to claim 176, wherein said information 
objects comprise at least one simple information object, said simple information 
object comprising one of the following: 

5 an elementary information unit; 

a set of elementary information units; and 

an ordered set of elementary information units. 

178. A method according to claim 1 76, wherein said elementary 
1 0 information units comprise at least one of the following: 

a sentence; a sequences of words; a word; a sequence of characters; a 
character; a sequence of numbers; a number; a sequence of digits; a digit; a 
vector; a curve; a pixel; a block of pixels; an audio frame; a musical note; a 
musical bar; a visual object; a sequence of video frames; a sequence of musical 
15 notes; a sequence of musical bars; and a video frame. 

179. A method according to claim 176, further comprising assigning 
elementary information units identifiers to elementary information units after 
identification. 

20 

1 80. A method according to claim 1 79, wherein said elementary 
information unit identifiers are utilized in said deducing. 

25 181. A method according to claim 176, wherein said information 

object identification is carried out on an instance of said information object, said 
information object instance being said information object in a specific format. 

182. A method according to claim 181, wherein said format comprise 
30 at least one of the following: 

jpeg image; gif image; Word document format; Lotus notes format; 
mpeg format; text format; rich text format; Unicode text format; multi byte text 
encoding format; formatted text format; ASCH text format; HTML; XML; 
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PDF; postscript; MS-Excel spreadsheet; MS-Excel drawing; MS-Visio drawing; 
Photoshop drawing; AutoCAD drawing format; and CAD drawing format. 

1 83. A method according to claim 1 79, wherein said elementary 
information unit identifiers are determined by the content of said elementary 
information units which they are assigned to. 

1 84. A method according to claim 1 83, wherein said elementary 
information unit identifiers are solely determined by said content. 

185. A method according to claim 1 79, wherein said elementary 
information units identifiers are at least partly determined by locations within an 
information object of respective elementary information units to which they are 
assigned. 

186. A method according to claim 179, wherein said elementary 
information units identifiers are at least partly determined by the content of an 
elementary information unit in proximity to said elementary information units 
to which they are assigned. 

1 87. A method according to claim 1 79, comprising storing said 
elementary information units identifiers in a database. 

188. A method according to claim 187, further comprising using said 
25 elementary information units identifiers stored in said database for identifying 

at least one further, unidentified, information object. 

189. A method according to claim 1 87, further comprising using said 
elementary information units identifiers stored in said database for comparing 

30 information objects. 

1 90. A method according to claim 1 79, comprising storing only some 
of said elementary information units identifiers in a database. 



20 
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191. A method according to claim 1 90, wherein said storing of only 
some of said elementary information units identifiers in a database is to achieve 
at least one of the following: 

reduce storage cost; 

5 increase efficiency of assigning of said elementary information units 

identifiers to said elementary information units by only performing said 
assignment for elementary information units identifiers that are stored in said 
database; and 

increase the efficiency of searching for said elementary information 
10 units identifiers in said database. 



192. A method according to claim 190, wherein said storage of only 
some of said elementary information units identifiers in a database is done in a 
manner that ensures that any area of a given size in said information object 
15 contains a predetermined minimum number of said stored elementary 
information units. 



193. A method according to claim 192, wherein said given size is 
dependent on properties of a respective information object. 

20 

194. A method according to claim 193, wherein said properties of 
said information object comprise at least one of the following: 

importance; size; confidentiality level; and format. 

25 1 95 . A method according to claim 1 92, wherein said minimum 

number is dependent on properties of said information objfect. 

196, A method according to claim 195, wherein said properties of said 
information object comprise at least one of the following: 
30 importance; size; confidentiality level; and format. 

J 97. A method according to claim 1 77, wherein said information 
objects comprise at least one compound information object, said compound 
information object comprising at least one of the following: 
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a simple information object; a compound information object; an ordered 
set of compound information objects; an ordered set of simple information 
objects; and an ordered set of compound and simple information objects. 

5 198. A method according to claim 1 76, wherein said information 

comprises at least one of the following: 

numeric data; spreadsheet data; numeric spreadsheet data; textual 

spreadsheet data; word processor data; textual data; hyper text data; audio data; 

visual data; multimedia data; binary data; raw data; database data; video data; 
10 drawing data; chart data; picture data; and image data. 

199. A method according to claim 176, comprising using said 
deducing to attach to said information object an information object policy, said 
policy comprising at least one of the following: 

15 an allowed distribution of said information object; 

a restriction on distribution of said information object; 

an allowed storage of said information object; 

a restriction on storage of said information object; 

an action to be taken as a reaction to an event; 
20 an allowed usage of said information object; and 

a restriction on usage of said information object. 

200. A method according to claim 199, wherein said information 
object policy comprises at least one action to be taken as a reaction to an event, 

25 and wherein said action comprises at least one of the following: 

preventing distribution of .said information object; preventing storage of 
said information object; preventing usage of said information object; reporting 
distribution of said information object; reporting storage of said information 
object; reporting usage of said information object; reporting; alerting about 

30 distribution of said information object; alerting storage of said information 

object; alerting usage of said information object; alerting; logging distribution 
of said information object; logging storage of said information object; logging 
usage of said information object; logging; notifying about distribution of said 
information object;. notifying about storage of said information object; notifying 
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about usage of said information object; notifying; notifying to an administrator; 
notifying to a manager; notifying to a recipient; notifying to a sender; notifying 

4 

to an owner of said information object; quarantine; alerting an administrator; 
alerting a manager; alerting a recipient; alerting a sender; alerting an owner of 
said information object; reporting to an administrator; reporting to a manager; 
reporting to a recipient; reporting to a sender; reporting to an owner of said 
information object; encrypting said information object; changing said 
information object; replacing said information object; and utilizing digital 
rights management technology on said information object. 



201. A method according to claim 199, wherein said information 

object policy comprises at least one action to be taken as a reaction to an event, 

and wherein said event comprises at least one of the following: 

attempted distribution of said information object; attempted storage of 
1 5 said information obj ect; 

attempted usage of said information object; distribution of said 
information object; storage of said information object; and usage of said 
information object. 

20 202. A method according to claim 1 99, wherein said information 

object usage comprises at least one of the following: 

copying an excerpt; editing; copying to clipboard; copying an excerpt to 
clipboard; changing format; changing encoding; encryption; decryption; 
changing digital management; opening by an application; and printing. 

25 

203 . A method according to claim 1 99, wherein said information 
object policy comprises placing a substantially imperceptible marking in said 
information object, said marking comprising information content, and said 
method comprising placing said marking, when indicated by said policy, before 
30 allowing at least one of the following: 

storage of said information object; usage of said information object; and 
distribution of said information object. 
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204. A method according to claim 203, wherein said information 
content for storage in said marking comprises at least one of the following: 
the identity of said information object; 

the identity of a user performing the action in respect to said information 

5 object; 

the identity of a user authorizing the action in respect to said information 

object; 

the identity of a user overriding policy and approving the action in 
respect to said information object; and 
1 0 the identity of a user requesting the action in respect to said information 

object. 

205. A method according to claim 199, wherein said information 
object policy further comprises changing said information object by at least one 
1 5 of the following: 

deleting part of said information object; replacing part of said 
information object; and inserting an additional part to said information object 
before allowing at least one of the following actions: 
storage of said information object; usage of said information object; and 
20 distribution of said information object. 

206. A method according to claim 205, wherein said changing of said 
information object is done in order to eliminate parts having policies that do not 
allow for said action to be executed while they are in the document. 

25 

207. A method according to claim 205, wherein said changing of said 
information object is carried out in order to personalize said information object. 

208. A method according to claim 205, wherein said changing of said 
30 information object is carried out in order to customize said information object 

for a specific use. 

209. A method according to claim 205, wherein said changing of said 
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information object is done in a manner selected to achieve at least one of the 

following: 

preserving the coherency of said information object; seamlessness; 
preserve the structure of said information object; preserving the linguistic 
coherency of said information object; preserving the formatting style of said 
information object; and preserve the pagination style of said information object. 



210. A method according to claim 205, wherein said information 
objects comprise compound information objects and wherein said changing of 
said information object is made to constituent parts of a compound information 
object. 

211. A method according to claim 1 99, wherein said storing 
comprises storage in at least one of the following: 

a portable media device; a floppy disk; a hard drive; a portable hard 
drive; a flash card; a flash device; disk on key; magnetic tape; magnetic media; 
optic media: punched cards; a machine readable media; a CD; a DVD; a 
firewire device; a USB device; and a hand held computer. 

212. A method according to claim 199, wherein said policy comprises 
distribution regulation, said distribution regulation being for regulating at least 
one of the following: 

sending said information object via mail; 
sending said information object via web mail; 
25 uploading said information object to a web server; 

uploading said information object to a FTP server; 
sending said information object via a file transfer application; 
sending said information object via an instant messaging application; 
sending said information object via a file transfer protocol; and 
30 sending said information object via an instant messaging protocol. 

213. A method according to claim 1 99, wherein said policy is 
dependent on at least one of the following: 
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the domain of a respective information object; the identity of a system; 

the identity of a user; the identity level of a user authorizing an action; the 

»■ 

identity of a user requesting an action; the identity of a user involved in an 
action; the identity of a user receiving an information object; the authentication 
5 level of a system; the authentication level of a user; the authentication level of a 
user requesting an action; the authentication level of a user authorizing an 
action; the authentication level of a user involved in an action; the 
authentication level of a user receiving said information object; the 
authentication level of a user sending said information object; the format of an 
1 0 information object instance; an interface being used; an application being used; 
encryption being used; digital rights management technology being used; 
detection of transformation, wherein said transformation is operable to reduce 
the ability to identify said transformed information object; information object 
integrity; regular usage pattern; regular distribution pattern; regular storage 
1 5 pattern; information path; consistency of an action with usage pattern; the 

identity of a user overriding policy and authorizing the action in respect to said 
information object; the authentication level of a user overriding policy and 
authorizing the action in respect to said information object; the identity of a user 
sending information object; information property of said information object; 
20 language of said information object; representation of said information object; 
operations done on of said information object; identity of users involved along 
the life cycle of said information object; application used on of said information 
object; transition channel of said information object; participant agents; virtual 
location of a computer, logical location of a computer; physical location of a 
25 computer; type of a computer; type of a laptop computer; type of a desktop 
computer; type of a server computer; and owner identity. 

214: A method according to claim 199, further comprising enabling at 
least one user to override at least one of decisions contained within said policy. 



30 



215. A method according to claim 1 76, wherein said deducing 
comprises utilizing conditional probabilities for at least one of the following: 
identification of information objects; 
classification of information objects; and 
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identification of a knowledge domain of information objects. 

216. A method according to claim 199, wherein at least part of said 
policy is stored in a database. 

5 

217. A method according to claim 1 76, wherein said deducing further 
comprising utilizing keywords for at least one of the following: 

identification of information objects; identification of elementary 
information units; classification of information objects; and identification of the 
1 0 domain of information objects. 

218. A method according to claim 217, wherein said keywords are 
stored in a database. 

15 219. A method according to claim 217, wherein said keywords are 

stored in at least one of the following forms: 

hash value; raw string: and numeric representation. 

220. A method according to claim 199, wherein at least part of said 
20 policy is defined in terms of a logic expression. 

22 1 . A method according to claim 220, wherein said expression is 
evaluated by lazy evaluation. 

25 222. A method according to claim 220, wherein at least some of the 

variables in said logic expression comprise of at least on of the following: 

an external function; an external function based on group membership; 
and an external variable. 

30 . 223. 

property of said information object, a property of a user, a property of a 
computer, a property of an entity , and a hierarchy of calculations. 
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224. A method according to claim 199, wherein at least part of said 
policy is defined in terms of a role, wherein said role consists of a property of at 
least one of a user and a system and wherein said role further comprises at least 
one authorization. 

225. A method according to claim 1 99, wherein at least part of said 
policy is defined in terms of at least one of the following languages: 

a scripting language; an ordered calculation language; a programming 
language; an interpreted language; and a functional language. 

226. A method according to claim 225, wherein said at least one of 
said following languages comprises instructions for the operation of an ordered 
calculation resulting in at least one of the following: 

policy; instruction to perform an action; restriction; and allowance. 

15 

227. A method according to claim 199, wherein said information 
object is a compound information object comprising constituent simple 
information objects, and a respective policy assigned to said information object 
comprises different policies for at least some of said constituent information 

20 objects. 

228. A method according to claim 176, wherein at least one user is 
defined in an owner definition as an owner of said information object. 

25 229. A method according to claim 228, wherein said owner definition 

is stored in a database. 

230. A method according to claim 176, wherein said deducing further 
comprises utilizing organizational structure information. 



30 



23 1 . A method according to claim 230, wherein said organizational 
structure information comprise at least one of the following: 

user superiority; working groups; organizational hierarchy; departmental 
separation; and membership in working groups. 
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232. A method according to claim 1 99, comprising using 
organizational structure information in order to assign a respective policy 



233. A method according to claim 230, wherein at least part of said 
organizational structure information is stored in a database. 

234. A method according to claim 230, Wherein at least part of said 
organizational structure information is used for information object 
classification. 

235. A method according to claim 230, wherein at least part of said 
organizational structure information is imported from at least one of the 
following: 

organizational data system; data management system; organizational 
data management system; 

knowledge management system; user directory; LDAP server; 
document; and an organizational chart. 

236. A method according to claim 176, further comprising making 
use of at least one user interface operable to assist in at least one of the 
following: 

classification; policy definition; template definition; .approving and 
revising automatic template definition; importing organizational structure 
information; revising organizational structure information; produce reports; 
overriding policy decisions; and providing authorizations. 

.237. A method according to claim 1 99, comprising defining an 
information class as a group consisting of at least two information objects, said 
defining further comprising associating with said information Class a 
corresponding class policy being a policy shared by said information objects. 



object. 
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238. A method according to claim 237, wherein said information class 
policy comprises at least a part of respective policies of said information objects 
within said class. 

5 239. A method according to claim 1 76, further comprising using 

template information objects to represent commonly repeated information, such 
that a template information object together with a difference information object 
representing instance specific information are together formable to produce a 
compound information object in which common and specific information are 

1 0 respectively identifiable. 

240. A method according to claim 239, comprising using said 
template information object in identifying any of unknown information object 
comprising information corresponding to said template information object. 

15 

241 . A method according to claim 239, wherein said template 
information object is a compound information object, wherein said template 
information object comprises at least one placeholder, and wherein said method 
comprises replacing said placeholder by at least part of said difference 

20 information object when said difference information object and. a respective 
template information object are combined. 

242. A method according to claim 24 1 , wherein at least one of said 
placeholders is a specialized placeholder, said specialized placeholder 

: 25 comprising specialization information to identify a respective specialization of 
said specialized placeholder. 

243 . A method according to claim 24 1 , wherein at least one of said 
placeholders is a specialized placeholder, said specialized placeholder 

30 comprising a restriction about information objects permitted for replacing said 
specialized placeholder. 
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244, A method according to claim 239, wherein said template 
information object comprises at least one of a group comprising: a disclaimer; a 
form; a header; a footer; a contract; and an invoice, 

5 245. A method according to claim 242, wherein said specialized 

placeholder comprises a restriction about information objects permitted for 
replacing said specialized placeholder, and wherein said restriction comprises a 
rule for excluding at least one of the following: 

an object comprising numeric information; 
10 an object comprising a word; 

an object comprising a character; 

an object comprising a digit; 

an object comprising a sentence; and 

an object comprising a simple information object. 

15 

246. A method according to claim 239, comprising defining a 
template information object and wherein said defining comprises automatically 
identifying a template information object candidate. 

20 247. A method according to claim 246, wherein said automatically 

identifying a template information object candidate comprises identification of 
shared elementary information units of at least two information objects. 

248. A method according to claim 246, wherein said step of 
25 automatically identifying a template information object candidate comprises 

identification of substantially similar information objects. 

249. A method according to claim 246, wherein said step of 
automatically identifying a template information object candidate comprises the 

30 use of at least one of text parsing; and text matching. 

250. A method according to claim 239, comprising deriving at least a 
part of a respective information object policy associated with a template 
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instance information object from an information object policy of the respective 
originating template information object. 

251. A method according to claim 1 99, wherein at least a part of an 
5 information object policy of a respective information object is derived from a 
default information object policy when said part of said information object 
policy of said information object is not explicitly defined. 



252. A method according to claim 179, comprising applying 

10 preprocessing to said elementary information units before assigning identifiers 
thereto. 

253. A method according to claim 252, wherein said preprocessing is 
done in order to enhance at least one of efficiency and robustness. 

15 

254. A method according to claim 252, wherein said preprocessing 
comprises at least one of canonization; removal of common words; removal of 
words not having a substantial effect on the meaning of the text; removal of 
punctuation; correction of spelling; canonization of spelling; scene detection; 

20 canonizing size; canonizing orientation; canonizing color; removing color; 

reducing noise; enhancing area separation; enhancing borders; enhancing lines; 
sharpening; blurring; removal of elementary information units substantially 
similar to neighboring elementary information units; canonization of grammar; 
and transformation to a phonetic representation. 

25 

255. A method according to claim 252, comprising carrying cut said 
preprocessing so as to ensure that any area of a given size in said information 
object contains at least a predetermined number of said elementary information 
units having an assigned elementary information unit identifier. 

30 

256. A method according to claim 255, wherein said given size is 
dependent on properties of said information object. 
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257. A method according to claim 256, wherein said properties of said 
information object comprise at least one of a group comprising: importance; 
size; confidentiality level; and format 

5 258. A method according to claim 255, wherein said predetermined 

number is dependent on properties of said information object. 

259. A method according to claim 258, wherein said properties of said 
information object comprise at least one of a group comprising: importance; 

10 size; confidentiality level; and format. 

260. A method according to claim 176, further comprising a stage of 
detection of information objects having undergone transformations. 

15 261. 

262. A method according to claim 260, wherein said detection of 
information objects that have undergone transformation comprises detection of 
at least one of a group comprising: 

20 transformation artifacts; spelling mistakes; wrong grammar; wrong 

punctuation; wrong capitalization; missing punctuation; missing capitalization; 
irregular word distribution; lack of common words; predominance of unknown 
words; inconsistent headers; headers inconsistent with file type; headers 
inconsistent with file content; file type inconsistent with file content; irregular 

25 distribution of characters; irregular distribution of words; irregular distribution 
of character sequences; irregular distribution of word sequences; irregular 
length of words; irregular length of sentences; irregular distribution of length of 
words; irregular distribution of length of sentences; irregular file format; 
irregular file encoding; unknown file format; unknown file encoding; mix of 

30 non-alphabetic characters; unopenable file; action time; information object 

creation time; information object update time; encryption; and an unexpectedly 
high level of entropy, 
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263. A method according to claim 179, comprising formulating 
respective assigned elementary information unit identifier to be resilient to 
small errors. 

264. A method according to claim 179, wherein said assigning of 
elementary information unit identifier utilizes image matching. 

265. A method according to claim 179, wherein said assigning of 
elementary information unit identifier comprises a mapping to a Euclidian 
space. 

266. A method according to claim 265, wherein said mapping to a 
Euclidian space comprises approximating a pairwise difference between 
elementary information units. 

267. A method according to claim 266, wherein said approximating is 
such that a difference between two elementary information units approximates 
said pairwise difference between said two elementary information units. 

268. A method according to claim 266, wherein said approximation of 
said pairwise difference between elementary information units comprises an 
approximation of at least one of the following: 

semantic difference; distance measured by image matching; phonetic 
difference; and spelling difference. 

269. A method according to claim 1 76, wherein said information 
object is a knowledge object. 

270. A method according to claim 176, wherein said elementary 
information unit is an elementary fact. 

271. A method according to claim 270, wherein said elementary fact 
comprises at least *>ne of the following: 
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sentence; database entry; representation independent description of 
knowledge; modular description of knowledge; and abstract description of 
knowledge. 

5 272. A method according to claim 237, wherein said information class 

is a knowledge class. 

273. A method according to claim 176, further comprising a stage of 
discerning lifecycle information about a respective information object. 

10 

274. A method according to claim 273, wherein said discerning of 
information about the lifecycle of said information object comprises utilizing 
information about sharing of at least one elementary information unit in said 
information object, wherein said elementary information unit is shared with at 

1 5 least one additional information object. 

275. A method according to claim 273, wherein said discerning of 
information about the lifecycle of said information object is based on at least 
one of a group comprising: file system date information; information about 

20 editing of said information object; and information about registration of said 
information object. 

276. A method according to claim 273, comprising utilizing said 
information about the lifecycle of said information object for the creation of a 

25 lifecycle graph. 

277. A method according to olaim 273, comprising utilizing said 
information about the lifecycle of said information object to define at least part 
of the policy of said information object said utilizing comprising identifying at 

30 least one other information object along said information object's lifecycle and 
examining a policy associated therewith. 

278. A method according to claim 1 79, wherein said assigning of said 
elementary information unit identifier is carried out a plurality of times, each 
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time utilizing a different method for assigning of an elementary information unit 
identifier. 



279. A method according to claim 278, wherein said assigning of 
5 elementary information unit identifier several times comprises storing of 

elementary information unit identifier assigned utilizing different methods are 
stored separately. 

280. A method according to claim 278, wherein said assigning of said 
10 elementary information unit identifier several times comprises storing of 

elementary information unit identifier assigned utilizing different methods can 
he distinguished according to said method utilized to assign them. 

281. A method according to claim 278, wherein said different 
1 5 methods are selected such as to optimize between at least any two of the 

following: 

storage space; search speed; capability to detect transformation; 
capability to detect a specific transformation; resilience to transformation; 
resolution of identification from among similar information objects; resolution 
20 of identification of boundaries within compound information obj ects; resilience 
to a specific transformation; and resilience to transformation. 

282. A method according to claim 1 79, wherein said assigning of a 
respective elementary information unit identifier comprises utilizing a method 
25 having at least one of the following characteristics: 

order sensitive to data in the elementary information unit; order 
insensitive in the elementary information unit; utilizing changing definitions of 
the elementary information unit such that said assigning of said elementary 
information unit identifier, is carried out a plurality of times using a plurality of 
3 0 definitions; utilizing an exchangeable method of preprocessing, such that said 
assigning of said elementary information unit identifier is carried out several 
times; being omission resilient; being insertion resilient; being replacement 
. -resilient; being dictionary based; being distribution based; being locality based; 
being histogram based; and being n-gram based. 
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283. A method according to claim 1 99, wherein an information object 
policy comprises at least some information about one or more methods utilized 
for assigning of an elementary information unit identifier to a respective 
5 information object. 



284. A method according to claim 278, wherein said assigning 
" utilizing different methods comprises utilizing said different methods 

sequentially until a predetermined stop condition is reached. 

10 

285. A method according to claim 179, wherein said information 
object comprises spreadsheet data, and wherein said assigning of said 
elementary information unit identifier assigned to said information object 
comprises utilizing a method comprising at least one of the following 

15 characteristics: 

invariance to linear transformation; invariance to reordering; invariance 

to permutation; resilience to linear transformation; resilience to reordering; 

resilience to permutation; resilience to minor changes; resilience to cuts; 

utilizing of statistic moment; utilizing of statistic moment for a table; utilizing 
20 statistic moment for a row; utilizing statistic moment for a column; and utilizing 

a mathematical descriptor of the information object data. 



286. A method according to claim 1 79„ comprising utilizing said 
elementary information unit identifiers for said information object identification 

25 using a technique having at least one of the following characteristics: omission 
resilience; insertion resilience; replacement resilience; being dictionary based; 
being distribution based; being locality based; being based on the size of 
elementary information units; being based on the size of information objects; 
resilience to linear transformation; resilience to reordering; resilience to 

30 permutation; resilience to minor changes; resilience to cuts; being histogram 
based; and being n-gram based. 

287. A method according to claim 1 76, further comprising utilizing a 

client. 
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288. A method according to claim 287, wherein said client comprises 
at least one of the following: 

end point software; end point hardware; tamper resistant software; 
5 tamper resistant hardware; 

client side software; and client side hardware. 

289. A method according to claim 287, comprising utilizing said 
client for at least one of the following: 

1 0 monitoring of client side storage; monitoring of client side access; 

monitoring of client side usage; 

monitoring of client side distribution; 

monitoring of copying of information object excerpts; 

monitoring of clipboard; 
1 5 monitoring of at least one application; 



20 



monitoring of at least one interface; 
control of at least one application; 
control of at least one interface; 
control of clipboard; 

control of copying of information object excerpts; 
control of client side storage; 
control of client side access; 
control of client side usage; and 
control of client side distribution. 



25 



290 ? A method according to claim 176, comprising utilising 
comparing of at least two information objects to calculate pairwise similarity 
between objects. 



30 



291 . A method according to claim 290, comprising utilizing said 
pairwise similarity to map said information objects to a space. 



292. A method according to claim 29 1 , wherein said space is an 
Euclidian space, and wherein the closeness between any two objects within said 
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Euclidian space is approximately proportional to said pairwise similarity 
between said information objects. 

293. A method according to claim 291, wherein said space is a 

5 weighted graph, and wherein the weight of an edge between any two objects 
within said graph space is approximately proportional to said pairwise similarity 
between said information objects. 

294. A method according to claim 291 , wherein said space is a graph, 
10 and wherein the existence of an edge between any two objects within said graph 

space is dependent on said pairwise similarity between said information objects. 

295. A method according to claim 291, wherein said space is utilized 
to identify at least one similarity information class, wherein said information 

15 class consists of at least two information objects, wherein said information class 
policy is a policy shared by the information class, and wherein said similarity 
information class is bounded within said space. 

296. A method according to claim 291 , comprising utilizing said 
20 space to identify at last one information object substantially similar to an 

unidentified information object. 

297. A method according to claim 291 , comprising using said space to 
identify at least one other information object substantially similar to an 

25 information object for which policy is not known, thereby to obtain a policy 

associated with said other information object to use as basis for a policy for said 
information object. 

298. A method according to claim 176, comprising storing 
30 information about said information object in a database. 

299. A method according to claim 1 76, further comprising extracting 
a descriptor of said information object, based on statistical analysis of said 
information object. 
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300. A method according to claim 176, comprising storing the order 
of said elementary information units within said information object in a 
database. 

5 

301. A method according to claim 300, comprising using said order 
for identification of said information object. 

302. A method according to claim 176, further comprising interfacing 
10 at least one of an information management system; and a document 

management system. 

303. A method according to claim 176, further comprising tracking at 
least one of the following: 

15 usage patterns; storage patterns; and distribution patterns. 

304. A method according to claim 303, wherein said tracking is 
carried out to infer information about at least one of the following: 

normal usage patterns; normal storage patterns; normal distribution 
20 patterns; irregular usage patterns; irregular storage patterns; and irregular 
distribution patterns. 

305. A method according to claim 304, wherein said inferred 
information is used to define at least part of a policy. 



25 



30 



306. A method according to clgim 304, comprising using said inferred 
information for information object classification. 

307. A method according to claim 176, further comprising logging. 

308. A method according to claim 307, wherein said logging 
comprising logging of at least one of the following: 

actions; events; and information objects identification. 
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309. A method according to claim 307, wherein at least part ot saia 
logging is controlled by a policy. 



310. A method according to claim 307, wherein at least part of said 
5 logging is stored in a database. 

311. A method according to claim 307, comprising utilizing said 
logging to augment lifecycle information for said information object. 

10 3 1 2. A method according to claim 1 76, further comprising assessing 

the integrity of at least one information object, wherein said integrity 
assessment consists of comparing said information object with a version of said 
information object for which integrity is assured. 

15 313. A method according to claim 312, further comprising issuing a 

certificate of said integrity for at least one information object. 

314. A method according to claim 313, wherein said certificate is a 
cryptographic certificate. 

20 

315. A method according to claim 312, further comprising replacing 
said information object with said version of said information object for which 
said integrity is assured. 

25 3 J 6 f A method according to claim 312, comprising identifying when 

said integrity of said information object is not satisfactory, and in such a case • 
not allowing distribution of said information object. 

317. A method according to claim 312, comprising identifying when 
30 said integrity of said information object is not satisfactory, and in such a case 
not allowing storage of said information object. 



1.00 
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318. A method according to claim 312, comprising identifying when 
said integrity of said information object is not satisfactory, and in such a case 
not allowing usage of said information object. 



5 3 1 9. A method according to claim 176, further comprising defining at 

least one constituent information object to be an ignored information object, and 
wherein, whenever said to be ignored information object is an element of a 
compound information, ignoring said object in identification of said compound 
information object. 

10 

320. A method according to claim 199, further comprising changing 
access control information in accordance with said policy. 

321 . A method according to claim 176, further comprising not 
15 allowing usage of respective ones of said information objects outside an 

organization. 



322. A method according to claim 176, further comprising not 
allowing storage of respective ones of said information object outside an 
20 organization. . 



323 . A method according to claim 1 76, further comprising not 
allowing distribution of respective ones of said information object outside an 
prganization. 

25 

324. A method according to claim 20 1 , wherein said policy comprises 
at least one mandatory lifecycle. 

325. A method according to claim 324, wherein said action is 

30 dependent on the matching of said mandatory lifecycle with a lifecycle of a 
respective event. 
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326. A method according to claim 324, wherein said mandatory 
lifecycle comprises at least one mandatory recipient of said information object; 
and an order of events concerning said information object. 

5 327. A method according to claim 205, wherein said inserting an 

additional part to said information object comprises inserting at least one of the 
following: a header; a footer; and a disclaimer. 

328. A method according to claim 199 or claim 213, further 

10 comprising defining areas and wherein said policy is dependent on whether an 
action is taken inside a given defined area. 

329. A method according to claim 199 or claim 213, further 
comprising defining areas and wherein said policy is dependent on whether an 

1 5 event occurs inside a given defined area. 

330. A method according to claim 176 or claim 179, comprising using 
said deducing to locate at least one information object with similar content to a 
given information object. 

20 

331. A method according to claim 199, comprising attaching a 
respective policy to information objects according to their logical location 
within an information storage medium. 

25 332. A method according to claim 331, further comprising utilizing a 

crawler for automatic location of information objects within said information 
storage medium. 

333. A method according to claim 332, wherein said information 
30 storage medium 

is a file system. 

334. A method according to claim 330, wherein said locating is done 
in an information storage medium. 
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335. A method according to claim 334, further comprising utilizing a 
crawler for automatic location of information objects within said information 
storage medium. 

336. A method according to claim 334, wherein said information storage 
medium comprises at least one file system. 



337. A method for automated computerized exchange of information 
10 within an information object having overall coherency, the method comprising 

selecting amongst and carrying out at least one of the following: 
deleting part of said information; 
replacing part of said information; and 
inserting an additional part to said information, 
15 wherein said carrying out additionally comprises preservation of the 

coherency of said information within said information object. 

338. A method according to claim 337, wherein said changing of said 
information is done in order to eliminate parts having policies that do not allow 
for at least one action to be executed while they are in the document. 

20 

339. A method according to claim 337, wherein said changing of said 
information is carried out in order to personalize said information. 

340. A method according to claim 337, wherein said changing of said 
25 information is carried out in order to customize said information for a specific 

use. 

341. A method according to claim 337, wherein said preserving said 
coherency comprises at least one of: 

30 maintaining seamlessness; preserving the structure of said information; 

preserving the linguistic coherency of said information; preserving the 
formatting sty le of said information; and preserve the pagination style of said 
information. 
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342. A method according to claim 337, wherein said information 
objects comprise compound information objects and wherein said changing of 
said information object is made to constituent parts of a compound information 

5 object. 

343. A method according to claim 337, carried out over a network 
having users with different access rights to said information object, said 
selecting and carrying out being to adapt said information object to conform to 

1 0 access rights of a one of said users to whom said information object is released. 

344. Apparatus for automatic information identification to enforce an 
information management policy on information objects, the apparatus 
comprising: 

15 a scanning module for finding elementary information units within said 

information object; and 

a deduction module for deducing information about the identity of said 

information object from identification of said elementary information units 

found within said information object, said deduced identity being usable to 

20 obtain a corresponding policy rule for applying to said information object. 

345. Apparatus according to claim 344, wherein said information 
objects comprise at least one simple information object, said simple information 
object comprising one of the following: 

25 an elementary information unit • 

a set of elementary information units; and 

an ordered set of elementary information units. 

346. Apparatus according to claim 344, wherein ^aid elementary 
30 information units comprise at least one of the following: 

a -sentence; a sequences of words; a word; a sequence of characters; a 
character; a sequence of numbers; a number; a sequence of digits; a digit; a 
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vector; a curve; a pixel; a block of pixels; an audio frame; a musical note; a 
musical bar; a visual object; a sequence of video frames; a sequence of musical 
notes; a sequence of musical bars; and a video frame. 

5 347. Apparatus according to claim 344, wherein said deduction 

module is further configured to assign elementary information unit identifiers to 
elementary information units after identification. 



10 348. Apparatus according to claim 347, wherein said deduction 

module is further configured to utilize said elementary information unit 
identifiers in said deducing. 

349. Apparatus according to claim 344, wherein said information 

15 object identification is carried out on an instance of said information object, said 
information object instance being said information object in a specific format. 

350. Apparatus according to claim 347, wherein said deduction 
module is configured to provide said elementary information unit identifiers in a 

20 manner determined at least partly by the content of said elementary information 
units which they are assigned to. 

351. Apparatus according to claim 350, wherein said elementary 
information unit identifiers are solely determined by said content. 

25 

352. Apparatus according to claim 347, wherein said deduction 
module is configured to provide said elementary information units identifiers in 
a manner at least partly determined by locations within an information object of 
respective elementary information units to which they are assigned. 

30 

353. Apparatus according to claim 344 further comprising a policy 
attachment unit associated with said deduction module, said policy attachment 
unit being configured to use said deducing to attach to said information object 
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an information object policy, said policy comprising at least one of the 
following: 

an allowed distribution of said information object; 
a restriction on distribution of said information object; 
5 an allowed storage of said information object; 

a restriction on storage of said information object; 
an action to be taken as a reaction to an event; 
an allowed usage of said information object; and 
a restriction on usage of said information object 

10 

354. Apparatus according to claim 344, wherein said deducing 
comprises utilizing conditional probabilities for at least one of the following: 
identification of information objects; 
classification of information objects; and 
1 5 identification of a knowledge domain of information objects. 



355. Apparatus for automated computerized exchange of information 
within an information object having overall coherency, the apparatus 
comprising a selector for selecting amongst at least one of the following data 
20 modifications: 

a deletion of part of said information; 
a replacement of part of said information; and 
an insertion of an additional part to said information, 
the apparatus further comprising a data modification unit associated with 
25 said selector for carrying out said selected modification within said information 
object, said data modification unit being associated wife a coherency retention 
module for detecting coherency features of said information object and altering 
said modification in order to preserve said detected coherency features within 
said information object. 



30 



356. Apparatus for automatic information identification of 
information objects, the apparatus comprising: 

a scanning module for finding elementary information units within said 
information object: and 
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a deduction module for deducing information about the identity ot said 
information object from identification of said elementary information units 
found within said information object, said deduced identity being usable for 
controlling use of said information object. 

5 

357. Apparatus according to claim 356, wherein said deduction 
module is further configured to assign elementary information unit identifiers to 
elementary information units after identification. 

10 
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AMENDED CLAIMS 

[Received by the International Bureau on 11 May 2004 (11.05.04)] 

1 . A method for monitoring information content carried in a 
medium, the method comprising: 
5 monitoring said medium for said information; 

seeking elementary information units within objects of said information 
being monitored in said medium; 

identifying said elementary information units; and 
deducing information about the content of said information objects from 
1 0 identification of said elementary information units found within said objects. 



2. A method according to claim 1, wherein said medium comprises 
at least one of the following: 

a distribution channel; and 
15 a storage medium. 

3. A method according to claim 1, wherein said information 
objects comprise at least one simple information object, said simple information 
object comprising one of the following: 

20 an elementary information unit; 

a set of elementary information units; and 

an ordered set of elementary information units. 

4. A method according to claim 1, wherein said elementary 
25 information units comprise at least one of the following: 

a sentence; a sequences of words; a word; a sequence of characters; a 
character; a sequence of numbers; a number; a sequence of digits; a digit; a 
vector; a curve; a pixel; a block of pixels; an audio frame; a musical note; a 
musical bar; a visual object; a sequence of video frames; a sequence of musical 
30 notes; a sequence of musical bars; and a video frame. 

5. A method according to claim 1, further comprising assigning 
elementary information units identifiers to elementary information units after 
identification. 
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6. A method according to claim 5, wherein said elementary 
information unit identifiers are utilized in said deducing. 

7. A method according to claim 1, wherein said information object 
identification is carried out on an instance of said information object, said 
information object instance being said information object in a specific format: 

8. A method according to claim 7, wherein said format comprise at 
least one of the following: 

jpeg image; gif image; Word document format; Lotus notes format; 
mpeg format; text format; rich text format; Unicode text format; multi byte text 
encoding format; formatted text format; ASCII text format; HTML; XML; 
PDF; postscript; MS-Excel spreadsheet; MS-Excel drawing; MS-Visio drawing; 
Photoshop drawing; AutoCAD drawing format; and CAD drawing format 

9. A method according to claim 5, wherein said elementary 
information unit identifiers are determined by the content of said elementary 
information units which they are assigned to. 

10. A method according to claim 9, wherein said elementary 
information unit identifiers are solely determined by said content 

11. A method according to claim 5, wherein said elementary 
information units identifiers are at least partly determined by locations within an 
information object of respective elementary information units to which they are 
assigned. 

12. A method according to claim 5, wherein said elementary 
information units identifiers are at least partly determined by the content of an 
elementary information unit in proximity to said elementary information units 
to which they are assigned. 
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13 . A method according to claim 5, comprising storing sai<J 
elementary information units identifiers in a database. 

14. A method according to claim 13, further comprising using said 
5 elementary information units identifiers stored in said database for identifying 

at least one further, unidentified, information object 

15. A method according to claim 13, further comprising using said 
elementary information units identifiers stored in said database for comparing 

10 information objects. 

16. A method according to claim 5, comprising storing only some of 
said elementary information units identifiers in a database. 

15 17. A method according to claim 1 6, wherein said storing of only 

some of said elementary information units identifiers in a database is to achieve 
at least one of the following: 
reduce storage cost; 

increase efficiency of assigning of said elementary information units 
20 identifiers to said elementary information units by only performing said 

assignment for elementary information units identifiers that are stored in said 
database; and 

increase the efficiency of searching for said elementary information 
units identifiers in said database. 

25 

18. A method according to claim 16, wherein said storage of only 
some of said elementary information units identifiers in a database is done in a 
manner that ensures that any area of a given size in said information object 
contains a predetermined minimum number of said stored elementary 

30 information units. 

19, A method according to claim 1 8, wherein said given size is 
dependent on properties of a respective information object. 
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20. A method according to claim 19, wherein said properties of said 
information object comprise at least one of the following: 
importance; size; confidentiality level; and format 

5 2 1 . A method according to claim 1 8, wherein said minimum number 

is dependent on properties of said information object. 

22. A method according to claim 21, wherein said properties of said 
information object comprise at least one of the following: 

10 importance; size; confidentiality level; and format 

23. A method according to claim 3, wherein said information objects 
comprise at least one compound information object, said compound information 
object comprising at least one of the following: 

15 a simple information object; a compound information object; an ordered 

set of compound information objects; an ordered set of simple information 
objects; and an ordered set of compound and simple information objects. 

24. A method according to claim 1, wherein said information 
20 comprises at least one of the following: 

numeric data; spreadsheet data; numeric spreadsheet data; textual 
spreadsheet data; word processor data; textual data; hyper text data; audio data; 
visual data; multimedia data; binary data; raw data; database data; video data; 
drawing data; chart data; picture data; and image data. 

25 

25. A method according to claim 1, wherein monitoring is done in at 
least one of the following: 

Firewall; Web server; Web proxy; HTTP proxy; HTTP server, SMTP 
gateway; SMTP server; 
30 Fax server, SOCKS proxy; Sniffer; Server; WAN gateway; proxy; 

Router; Mail server; file server; client; file system; gateway; router; application; 
operating system; database; database accessing utility; database accessing 
server; Internal mail server; External mail server; Message board; NNTP server; 
and an IRC server. _ . 
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26. A method according to claim 1 , wherein monitoring is carried 
out on at least one of the following traffic types: 

Instant messaging; IP; HTTP; Mail; TCP; UDP; Web; Streaming; Chat; 
5 IRC; computer network; LAN; WAN; VPN; POP3; MAPI; FTP; NNTP; File 
transfer, IMAP; SMTP; and Fax, 

27. A method according to claim 1 , wherein monitoring is done by at 
least one of the following: 

10 buffering; caching; forwarding; sniffing; and relaying. 

28. A method according to claim 1, wherein monitoring comprises 
at least one of the following: 

blocking traffic; altering traffic; and altering traffic such as to invalidate 
15 said traffic. 

29. A method according to claim 1, comprising carrying out said 
monitoring at a proxy. 

20 30. A method according to claim 29, comprising routing traffic to be 

monitored to said proxy. 

31. A method according to claim 29, comprising blocking any traffic 
requiring monitoring which manages to bypass said proxy. 

25 

32. A method according to claim 3 1 , comprising using a firewall to 
carry out said blocking. 

33. A method according to claim 29, wherein said proxy is a SOCKS 

30 proxy. 

34. A method according to claim 29, wherein said proxy is an HTTP 

proxy. 
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35. A method according to claim 1, comprising monitoring instant 
messaging traffic. 



10 



36. A method according to claim 35, comprising monitoring file 
distribution controlled by said instant messaging traffic. 

37. A method according to claim 36, comprising altering said instant 
messaging traffic controlling said file distribution, thereby to facilitate capturing 
said file distribution. 



38. A method according to claim 1, comprising using said deducing 
to attach to said information object an information object policy, said policy 
comprising at least one of the following: 

an allowed distribution of said information object; 
15 a restriction on distribution of said information object; 

an allowed storage of said information object; 

a restriction on storage of said information object; 

an action to be taken as a reaction to an event; 

an allowed usage of said information object; and 
20 a restriction on usage of said information object 

39. A method according to claim 38, wherein said information 
object policy comprises at least one action to be taken as a reaction to an event, 
and wherein said action comprises at least one of the following: 

25 preventing distribution of said information object; preventing storage of 

said information object; preventing usage of said information object; reporting 
distribution of said information object; reporting storage of said information 
. object; reporting usage of said information object; reporting; alerting about 
distribution of said information object; alerting storage of said information 

30 object; alerting usage of said information object; alerting; logging distribution 
of said information object; logging storage of said information object; logging 
usage of said information object; logging; notifying about distribution of said 
information object; notifying about storage of said information object; notifying 
about usage of said information object; notifying; notifying to an administrator; 
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notifying to a manager; notifying to a recipient; notifying to a sender; notifying 
to an owner of said information object; quarantine; alerting an administrator; 
alerting a manager; alerting a recipient; alerting a sender; alerting an owner of 
said information object; reporting to an administrator; reporting to a manager, 
5 reporting to a recipient; reporting to a sender; reporting to an owner of said 
information object; encrypting said information object; changing said 
information object; replacing said information object; and ut ilizing digital 
rights management technology on said information object 

10 40. A method according to claim 38, wherein said information object 

policy comprises at least one action to be taken as a reaction to an event, and 

wherein said event comprises at least one of the following: 

attempted distribution of said information object; attempted storage of 
said information object; 
15 attempted usage of said information object; distribution of said 

information object; storage of said information object; and usage of said 
information object 

41. A method according to claim 38, wherein said information object 
20 usage comprises at least one of the following: 

copying an excerpt; editing; copying to clipboard; copying an excerpt to 
clipboard; changing format; changing encoding; encryption; decryption; 
changing digital management; opening by an application; and printing. 

25 42. A method according to claim 38, wherein said information object 

policy comprises placing a substantially imperceptible marking in said 
information object, said marking comprising information content, and said 
method comprising placing said marking, when indicated by said policy, before 
allowing at least one of the following: 

30 storage of said information object; usage of said information object; and 

distribution of said information object 
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43. A method according to claim 42, wherein said information 
content for storage in said marking comprises at least one of the following: 
the identity of said information object; 

the identity of a user performing the action in respect to said information 

5 object; 

the identity of a user authorizing the action in respect to said information 

object; 

the identity of a user overriding policy and approving the action in 
respect to said information object; and 
10 the identity of a user requesting the action in respect to said information 

object 

44. A method according to claim 38, wherein said information object 
policy further comprises changing said information object by at least one of the 

15 following: 

deleting part of said information object; replacing part of said 
information object; and inserting an additional part to said information object 
before allowing at least one of the following actions: 

storage of said information object; usage of said information object; and 
20 distribution of said information object. 

45 . A method according to claim 44, wherein said changing of said 
information object is done in order to eliminate parts having policies that do not 
allow for said action to be executed while they are in the document 

25 

46. A method according to claim 44, wherein said changing of said 
informatioii object is carried out in order to personalize said information object 

47. A method according to claim 44, wherein said changing of said 
30 information object is carried out in order to customize said information object 

for a specific use. 

48. A method according to claim 44, wherein said changing of said 
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information object is done in a manner selected to achieve at least one of the 
following: 

preserving the coherency of said information object; seamlessness; 
preserve the structure of said information object; preserving the linguistic 
5 coherency of said information object; preserving the formatting style of said 
information object; and preserve the pagination style of said information object 

49* A method according to claim 44, wherein said information 
objects comprise compound information objects and wherein said changing of 
10 said information object is made to constituent parts of a compound information 
object. 



50. A method according to claim 38, wherein said storing comprises 
storage in at least one of the following: 
15 a portable media device; a floppy disk; a hard drive; a portable hard 

drive; a flash card; a flash device; disk on key; magnetic tape; magnetic media; 
optic media; punched cards; a machine readable media; a CD; a DVD; a 
firewire device; a USB device; and a hand held computer. 



20 5 1 • A method according to claim 38, wherein said policy comprises 

distribution regulation, said distribution regulation being for regulating at least 
one of the following: 

sending said information object via mail; 

sending said information object via web mail; 
25 uploading said information object to a web server; 

uploading said information object to a FTP server; 

sending said information object via a file transfer application; 

sending said information object via an instant messaging application; 

sending said information object via a file transfer protocol; and 
30 sending said information object via an instant messaging protocol* 

52. A method according to claim 38, wherein said policy is 
dependent on at least one of the following: 
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the domain of a respective information object; the identity of a system; 
the identity of a user; the identity level of a user authorizing an action; the 
identity of a user requesting an action; the identity of a user involved in an 
action; the identity of a user receiving an information object; the authentication 
5 level of a system; the authentication level of a user; the authentication level of a 
user requesting an action; the authentication level of a user authorizing an 
action; the authentication level of a user involved in an action; the 
authentication level of a user receiving said information object; the 
authentication level of a user sending said information object; the format of an 
10 information object instance; an interface being used; an application being used; 
encryption being used; digital rights management technology being used; 
detection of transformation, wherein said transformation is operable to reduce 
the ability to identify said transformed information object; information object 
integrity; regular usage pattern; regular distribution pattern; regular storage 
15 pattern; information path; consistency of an action with usage pattern; the 

identity of a user overriding policy and authorizing the action in respect to said 
information object; the authentication level of a user overriding policy and 
authorizing the action in respect to said information object; the identity of a user 
sending information object; information property of said information object; 
20 language of said information object; representation of said information object; 
operations done on of said information object; identity of users involved along 
the life cycle of said information object; application used on of said information 
object; transition channel of said information object; participant agents; virtual 
location of a computer; logical location of a computer, physical location of a 
25 computer; type of a computer; type of a laptop computer; type of a desktop 
computer; type of a server computer; and owner identity. 

53 . A method according to claim 38, further comprising enabling at 
least one user to override at least one of decisions contained within said policy. 

30 

54. A method according to claim 1 , wherein said deducing 
comprises utilizing conditional probabilities for at least one of the following: 

identification of information objects; 

classification of information objects; and 

<b& 
M7 

AMENDED SHEET (ARTICLE 19) 



WO 2004/040464 



PCT/IL2003/000889 



/ 10 



15 



20 



25 



identification of a knowledge domain of information objects. 

55. A method according to claim 38, wherein at least part of said 
policy is stored in a database. 

56. A method according to claim 1 , wherein said deducing further 
comprising utilizing keywords for at least one of the following: 

identification of information objects; identification of elementary 
information units; classification of information objects; and identification of the 
domain of information objects. 

57. A method according to claim 56, wherein said keywords are 
stored in a database. 

58. A method according to claim 56, wherein said keywords are 
stored in at least one of the following forms: 

hash value; raw string; and numeric representation. 

59. A method according to claim 3 8, wherein at least part of said 
policy is defined in terms of a logic expression. 

60. A method according to claim 59, wherein said expression is 
evaluated by lazy evaluation. 

61. A method according to claim 59, wherein at least some of the 
variables in said logic expression comprise of at least on of the following: 

an external function; an external function based on group membership; 
and an external variable. 

62. A method according to claim 38, wherein at least part of said policy 
is defined in terms of any one of a group comprising rules, imposed restrictions, 
granted privileges, reaction to one or more given events, group operations, a 
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property of said information object, a property of a user, a property of a 
computer, a property of an entity, and a hierarchy of calculations. 



63. A method according to claim 38, wherein at least part of said 

5 policy is defined in terms of a role, wherein said role consists of a property of at 
least one of a user and a system and wherein said role further comprises at least 
one authorization. 

64. A method according to claim 3 8, wherein at least part of said 
10 policy is defined in terms of at least one of the following languages: 

a scripting language; an ordered calculation language; a programming 
language; an interpreted language; and a functional language. 

65. A method according to claim 64, wherein said at least one of said 
15 following languages comprises instructions for the operation of an ordered 

calculation resulting in at least one of the following: 

policy; instruction to perform an action; restriction; and allowance. 

66. A method according to claim 38, wherein said information object 
20 is a compound information object comprising constituent simple information 

objects, and a respective policy assigned to said information object comprises 
different policies for at least some of said constituent information objects. 

67. A method according to claim 1, wherein at least one user is 
25 defined in an owner definition as an owner of said information object 

68. A method according to claim 67, wherein said owner definition 
is stored in a database. 

30 69. A method according to claim 1, wherein said deducing further 

. comprises utilizing organizational structure information. 
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70. A method according to claim 69, wherein said organizational 
structure information comprise at least one of the following: 

user superiority; working groups; organizational hierarchy; departmental 
separation; and membership in working groups. 

5 

71. A method according to claim 38, comprising using 
organizational structure information in order to assign a respective policy 
object 

10 72. A method according to claim 69, wherein at least part of said 

organizational structure information is stored in a database. 

73 . A method according to claim 69, wherein at least part of said 
organizational structure information is used for information object 

15 classification. 

74. A method according to claim 69, wherein at least part of said 
organizational structure information is imported from at least one of the 
following: 

20 organizational data system; data management system; organizational 

data management system; 

knowledge management system; user directory; LD AP server; 
document; and an organizational chart. 

25 75. A method according to claim 1, further comprising making use 

of at least one user interface operable to assist in at least one of the following: 

classification; policy definition; template definition; approving and 
revising automatic template definition; importing organizational structure 
information; revising organizational structure information; produce reports; 

30 overriding policy decisions; and providing authorizations. 

76. A method according to claim 3 8, comprising defining an 
information class as a group consisting of at least two information objects, said 
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defining further comprising associating with said information class a 
corresponding class policy being a policy shared by said information objects. 

77. A method according to claim 76, wherein said information class 
5 policy comprises at least a part of respective policies of said information objects 

within said class. 

78. A method according to claim 1, further comprising using 
template information objects to represent commonly repeated information, such 

10 that a template information object together with a difference information object 
representing instance specific information are together formable to produce a 
compound information object in which common and specific information are 
respectively identifiable. 

15 79. A method according to claim 78, comprising using said template 

information object in identifying any of unknown information object 
comprising information corresponding to said template information object 

80. A method according to claim 78, wherein said template 

20 information object is a compound information object, wherein said template 

information object comprises at least one placeholder, and wherein said method 
comprises replacing said placeholder by at least part of said difference 
information object when said difference information object and a respective 
template information object are combined. 

25 

81. A method according to claim 80, wherein at least one of said 
placeholders is a specialized placeholder, said specialized placeholder 
comprising specialization information to identify a respective specialization of 
said specialized placeholder. 

30 

82. A method according to claim 80, wherein at least one of said 
placeholders is a specialized placeholder, said specialized placeholder 
comprising a restriction about information objects permitted for replacing said 
specialized placeholder. 
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83 . A method according to claim 78, wherein said template 
information object comprises at least one of a group comprising: a disclaimer; a 
form; a header; a footer; a contract; and an invoice. 

84. A method according to claim 8 1 , wherein said specialized 
placeholder comprises a restriction about information objects permitted for 
replacing said specialized placeholder, and wherein said restriction comprises a 
rule for excluding at least one of the following: 

an object comprising numeric information; 
an object comprising a word; 
an object comprising a character; 
an object comprising a digit; 
an object comprising a sentence; and 
an object comprising a simple information object. 

85. A method according to claim 78, comprising defining a template 
information object and wherein said defining comprises automatically 
identifying a template information object candidate. 

20 

86. A method according to claim 85, wherein said automatically 
identifying a template information object candidate comprises identification of 
shared elementary information units of at least two information objects. 

25 87. A method according to claim 85, wherein said step of 

automatically identifying a template information object candidate comprises 
identification of substantially similar information objects. 

88. A method according to claim 85, wherein said step of 

30 automatically identifying a template information object candidate comprises the 
use of at least one of text parsing; and text matching. 

89. A method according to claim 78, comprising deriving at least a 
part of a respective information object policy associated with a template 
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instance information object from an information object policy of the respective 
originating template information object 



90. A method according to claim 38, wherein at least a part of an 
5 information object policy of a respective information object is derived from a 

default information object policy when said part of said information object 
policy of said information object is not explicitly defined. 

91. A method according to claim 5, comprising applying 

10 preprocessing to said elementary information units before assigning identifiers 
thereto. 

92. A method according to claim 9 1 , wherein said preprocessing is 
done in order to enhance at least one of efficiency and robustness. 

15 

93. A method according to claim 91, wherein said preprocessing 
comprises at least one of canonization; removal of common words; removal of 
words not having a substantial effect on the meaning of the text; removal of 
punctuation; correction of spelling; canonization of spelling; scene detection; 

20 canonizing size; canonizing orientation; canonizing color; removing color; 

reducing noise; enhancing area separation; enhancing borders; enhancing lines; 
sharpening; blurring; removal of elementary information units substantially 
similar to neighboring elementary information units; canonization of grammar; 
and transformation to a phonetic representation. 

25 

94. A method according to claim 9 1 , comprising carrying out said 
preprocessing so as to ensure that any area of a given size in said information 
object contains at least a predetermined number of said elementary information 
units having an assigned elementary information unit identifier. 

30 

95. A method according to claim 94, wherein said given size is 
dependent on properties of said information object 

G>0> 



AMENDED SHEET (ARTICLE 19) 



WO 2004/040464 



PCT/IL2003/000889 



96. A method according to claim 95, wherein said properties of said 
information object comprise at least one of a group comprising: importance; 
size; confidentiality level; and format 



5 



97. A method according to claim 94, wherein said predetermined 
number is dependent on properties of said information object 



98. A method according to claim 97, wherein said properties of said 
information object comprise at least one of a group comprising: importance; 



10 size; confidentiality level; and format 

99. A method according to claim 1, further comprising a stage of 
detection of information objects having undergone transformations. 



aimed for detection of transformations intended to reduce the ability to identify 
said information object 

101. A method according to claim 99, wherein said detection of 
20 information objects that have undergone transformation comprises detection of 
at least one of a group comprising: 

transformation artifacts; spelling mistakes; wrong grammar; wrong 
punctuation; wrong capitalization; missing punctuation; missing capitalization; 
irregular word distribution; lack of common words; predominance of unknown 
25 words; inconsistent headers; headers inconsistent with file type; headers 

inconsistent with file content; file type inconsistent with file content; irregular 
distribution of characters; irregular distribution of words; irregular distribution 
of character sequences; irregular distribution of word sequences; irregular 
length of words; irregular length of sentences; irregular distribution of length of 
30 words; irregular distribution of length of sentences; irregular file format; 

irregular file encoding; unknown file format; unknown file encoding; mix of 
non-alphabetic characters; unopenable file; action time; information object 



15 



100. A method according to claim 99, wherein said stage of detection is 
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creation time; information object update time; encryption; and an unexpectedly 
high level of entropy. 



102. A method according to claim 5, comprising formulating 

5 respective assigned elementary information unit identifier to be resilient to 
small errors. 

103. A method according to claim 5, wherein said assigning of 
elementary information unit identifier utilizes image matching. 

10 

104. A method according to claim 5, wherein said assigning of 
elementary information unit identifier comprises a mapping to a Euclidian 
space. 

15 105. A method according to claim 104, wherein said mapping to a 

Euclidian space comprises approximating a pairwise difference between 
elementary information units. 

106. A method according to claim 105, wherein said approximating is 
20 such that a difference between two elementary information units approximates 

said pairwise difference between said two elementary information units. 

107. A method according to claim 105, wherein said approximation of 
said pairwise difference between elementary information units comprises an 

25 approximation of at least one of the following: 

semantic difference; distance measured by image matching; phonetic 
difference; and spelling difference. 

108. A method according to claim 1, wherein said information object 
30 is a knowledge object 

109. A method according to claim 1, wherein said elementary 
information unit is an elementary feet. 
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110. A method according to claim 109, wherein said elementary fact 
comprises at least one of the following: 

sentence; database entry; representation independent description of 
knowledge; modular description of knowledge; and abstract description of 
5 knowledge. 

111. A method according to claim 76, wherein said information class 
is a knowledge class. 

10 1 12. A method according to claim 1, further comprising a stage of 

discerning lifecycle information about a respective information object 

113. A method according to claim 112, wherein said discerning of 
information about the lifecycle of said information object comprises uti lizing 
15 information about sharing of at least one elementary information unit in said 
information object, wherein said elementary information unit is shared with at 
least one additional information object 



1 14. A method according to claim 1 12, wherein said discerning of 
20 information about the lifecycle of said information object is based on at least 
one of a group comprising: file system date information; information about 
editing of said information object; and information about registration of said 
information object 

25 115. A method according to claim 1 12, comprising uti lizing said 

information about the lifecycle of said information object for the creation of a 
lifecycle graph. 

1 16. A method according to claim 1 12, comprising utilizing said 
30 information about the lifecycle of said information object to define at least part 
of the policy of said information object said utilizing comprising identifying at 
least one other information object along said information object's lifecycle and 
examining a policy associated therewith. 

jtae- 
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1 17. A method according to claim 5, wherein said assigning of said 
elementary information unit identifier is carried out a plurality of times, each 
time utilizing a different method for assigning of an elementary information unit 
identifier. 

5 

118. A method according to claim 117, wherein said assigning of 
elementary information unit identifier several times comprises storing of 
elementary information unit identifier assigned utilizing different methods are 
stored separately. 

10 

1 19. A method according to claim 1 17, wherein said assigning of said 
elementary information unit identifier several times comprises storing of 
elementary information unit identifier assigned utilizing different methods can 
be distinguished according to said method utilized to assign them. 

15 

1 20. A method according to claim 117, wherein said different 
methods are selected such as to optimize between at least any two of the 
following: 

storage space; search speed; capability to detect transformation; 
20 capability to detect a specific transformation; resilience to transformation; 

resolution of identification from among similar information objects; resolution 
of identification of boundaries within compound information objects; resilience 
to a specific transformation; and resilience to transformation. 

25 1 2 1 . A method according to claim 5, wherein said assigning of a 

respective elementary information unit identifier comprises utilizing a method 
having at least one of the following characteristics: 

order sensitive to data in the elementary information unit; order 
insensitive in the elementary information unit; utilizing changing definitions of 

30 the elementary information unit such that said assigning of said elementary 

information unit identifier is carried out a plurality of times using a plurality of 
definitions; utilizing an exchangeable method of preprocessing, such that said 
assigning of said elementary information unit identifier is carried out several 
times; being omission resilient; being insertion resilient; being replacement 
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resilient; being dictionary based; being distribution based; being locality based; 
being histogram based; and being n-gram based. 

122. A method according to claim 38, wherein an information object 
5 policy comprises at least some information about one or more methods utilized 
for assigning of an elementary information unit identifier to a respective 
information object 



123 . A method according to claim 117, wherein said assigning 
10 utilizing different methods comprises utilizing said different methods 

sequentially until a predetermined stop condition is reached. 

124. A method according to claim 5, wherein said information object 
comprises spreadsheet data, and wherein said assigning of said elementary 

1 5 information unit identifier assigned to said information object comprises 

utilizing a method comprising at least one of the following characteristics: 

invariance to linear transformation; invariance to reordering; invariance 

to permutation; resilience to linear transformation; resilience to reordering; 

resilience to permutation; resilience to minor changes; resilience to cuts; 
20 utilizing of statistic moment; utilizing of statistic moment for a table; utilizing 

statistic moment for a row; utilizing statistic moment for a column; and utilizing 

a mathematical descriptor of the information object data. 

125. A method according to claim 5, comprising utilizing said 

25 elementary information unit identifiers for said information object identification 
using a technique having at least one of the following characteristics: omission 
resilience; insertion resilience; replacement resilience; being dictionary based; 
being distribution based; being locality based; being based on the size of 
elementary information units; being based on the size of information objects; 

30 resilience to linear transformation; resilience to reordering; resilience to 

permutation; resilience to minor changes; resilience to cuts; being histogram 
based; and being n-gram based. 
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126. A method according to claim 1, further comprising utilizing a 

client 

127. A method according to claim 126, wherein said client comprises 
5 at least one of the following: 

end point software; end point hardware; tamper resistant software; 
tamper resistant hardware; 

client side software; and client side hardware. 

10 128. A method according to claim 126, comprising utilizing said 

client for at least one of the following: 

monitoring of client side storage; monitoring of client side access; 
monitoring of client side usage; 

monitoring of client side distribution; 
1 5 monitoring of copying of information object excerpts; 



129. A method according to claim 1 , comprising utilizing comparing 
of at least two information objects to calculate pairwise similarity between 
30 objects. 



20 



25 



monitoring of clipboard; 
monitoring of at least one application; 
monitoring of at least one interface; 
control of at least one application; 
control of at least one interface; 
control of clipboard; 

control of copying of information object excerpts; 
control of client side storage; 
control of client side access; 
control of client side usage; and 
control of client side distribution. 



130. A method according to claim 129, comprising utilizing said 
pairwise similarity to map said information objects to a space. 
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131. A method according to claim 130, wherein said space is an 
Euclidian space, and wherein the closeness between any two objects within said 
Euclidian space is approximately proportional to said pairwise similarity 
between said information objects. 

5 

132. A method according to claim 130, wherein said space is a 
weighted graph, and wherein the weight of an edge between any two objects 
within said graph space is approximately proportional to said pairwise similarity 
between said information objects, 

10 

133. A method according to claim 130, wherein said space is a graph, 
and wherein the existence of an edge between any two objects within said graph 
space is dependent on said pairwise similarity between said information objects. 

15 134. A method according to claim 130, wherein said space is utilized 

to identify at least one similarity information class, wherein said information 
class consists of at least two information objects, wherein said information class 
policy is a policy shared by the information class, and wherein said similarity 
information class is bounded within said space. 

20 

135. A method according to claim 130, comprising utilizing said 
space to identify at last one information object substantially similar to an 
unidentified information object. 

25 1 36. A method according to claim 130, comprising using said space to 

identify at least one other information object substantially similar to an 
information object for which policy is not known, thereby to obtain a policy 
associated with said other information object to use as basis for a policy for said 
information 1 object. 



30 



137. A method according to claim 1, comprising storing information 
about said information object in a database. 



73 
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138. A method according to claim 1, further comprising extracting a 
descriptor of said information object, based on statistical analysis of said 
information object 



5 1 39, A method according to claim 1 , comprising storing the order of 

said elementary information units within said information object in a database. 

140. A method according to claim 139, comprising using said order 
for identification of said information object 

10 

141 . A method according to claim 1 , further comprising interfacing at 
least one of an information management system; and a document management 
system. 

15 142. A method according to claim 1, further comprising tracking at 

least one of the following: 

usage patterns; storage patterns; and distribution patterns. 

143 . A method according to claim 142, wherein said tracking is 
20 carried out to infer information about at least one of the following: 

normal usage patterns; normal storage patterns; normal distribution 
patterns; irregular usage patterns; irregular storage patterns; and irregular 
distribution patterns. 

25 144. A method according to claim 143, wherein said inferred 

information is used to define at least part of a policy. 

145. A method according to claim 143, comprising using said inferred 
information for information object classification. 

30 

146. A method according to claim 1, further comprising logging. 

147. A method according to claim 146, wherein said logging 
comprising logging of at least one of the following: 

AMENDED SHEET (ARTICLE 19) 



WO 2004/040464 PCT7IL2003/000889 

actions; events; and information objects identification. 



148. A method according to claim 146, wherein at least part of said 
logging is controlled by a policy. 
5 149. A method according to claim 146, wherein at least part of said 

logging is stored in a database. 

150. A method according to claim 146, comprising utilizing said 
logging to augment lifecycle information for said information object 



10 



15 



151. A method according to claim 1 , further comprising assessing the 
integrity of at least one information object, wherein said integrity assessment 
consists of comparing said information object with a version of said information 
object for which integrity is assured. 

1 52. A method according to claim 151, further comprising issuing a 
certificate of said integrity for at least one information object. 



153. A method according to claim 152, wherein said certificate is a 
20 cryptographic certificate. 

1 54. A method according to claim 151, further comprising replacing 
said information object with said version of said information object for which 
said integrity is assured. 

25 

1 55. A method according to claim 151, comprising identifying when 
said integrity of said information object is not satisfactory, and in such a case 
not allowing distribution of said information object 

30 1 56. A method according to claim 151, comprising identifying when 

said integrity of said information object is not satisfactory, and in such a case 
not allowing storage of said information object. 



ns 
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157- A method according to claim 151, comprising identifying when 
said integrity of said information object is not satisfactory, and in such a case 
not allowing usage of said information object. 



5 158. A method according to claim 1, further comprising defining at 

least one constituent information object to be an ignored information object, and 
wherein, whenever send to be ignored information object is an element of a 
compound information, ignoring said object in identification of said compound 
information object 

10 

1 59. A method according to claim 3 8, further comprising changing 
access control information in accordance with said policy. 

1 60. A method according to claim 1 , further comprising not allowing 
1 5 usage of respective ones of said information objects outside an organization. 

161. A method according to claim 1 , further comprising not allowing 
storage of respective ones of said information object outside an organization. 



20 1 62. A method according to claim 1, further comprising not allowing 

distribution of respective ones of said information object outside an 
organization. 

163. A method according to claim 40, wherein said policy comprises 
.25 at least one mandatory lifecycle. 



164. A method according to claim 163, wherein said action is 
dependent on the matching of said mandatory lifecycle with a lifecycle of a 
respective event 

30 

165. A method according to claim 163, wherein said mandatory 
lifecycle comprises at least one mandatory recipient of said information object; 
and an order of events concerning said information object 

AMENDED SHEET (ARTICLE 19) 



WO 2004/040464 PCT7IL2003/000889 

166. A method according to claim 44, wherein said inserting an 
additional part to said information object comprises inserting at least one of the 
following: a header; a footer; and a disclaimer. 

5 167. A method according to claim 38 or claim 52, comprising 

defining areas and wherein said policy is dependent on whether an action is 
taken inside a user-defined area, 

168. A method according to claim 38 or claim 52, comprising 

10 defining areas and wherein said policy is dependent on whether an event occurs 
inside a user-defined area. 

169. A method according to claim 1 or claim 5, comprising using said 
deducing to locate at least one information object with similar content to a given 

1 5 information object 

170. A method according to claim 38, comprising attaching a 
respective policy to information objects according to their logical location 
within an information storage medium. 

20 

171. A method according to claim 170, further comprising utilizing a 
crawler for automatic location of information objects. 

1 72. A method according to claim 171, wherein said information 
25 storage medium is a file system. 

173. A method according to claim 169, wherein said locating is done 
in an information storage medium* 

30 

1 74. A method according to claim 173, further comprising utilizing a 
crawler for automatic location of information objects within said information 
storage medium. 

35 175. A method according to claim 173, wherein said information 

storage medium comprises at least one file system. 
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1 76. A method for information identification comprising: 
Finding elementary information units within said information object; 

and 

5 Deducing information about the identity of said information object from 

identification of said elementary information units found within said 
information object 



177. A method according to claim 176, wherein said information 

10 objects comprise at least one simple information object, said simple information 
object comprising one of the following: 
an elementary information unit; 
a set of elementary information units; and 
an ordered set of elementary information units. 

15 

178. A method according to claim 1 76, wherein said elementary 
information units comprise at least one of the following: 

a sentence; a sequences of words; a word; a sequence of characters; a 
character; a sequence of numbers; a number; a sequence of digits; a digit; a 
20 vector; a curve; a pixel; a block of pixels; an audio frame; a musical note; a 

musical bar; a visual object; a sequence of video frames; a sequence of musical 
notes; a sequence of musical bars; and a video frame. 

1 79. A method according to claim 1 76, further comprising assigning 
25 elementary information units identifiers to elementary information units after 

identification. 



1 80. A method according to claim 179, wherein said elementary 
30 information unit identifiers are utilized in said deducing. 



1L 
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181. A method according to claim 176, wherein said information 
object identification is carried out on an instance of said information object, said 
information object instance being said information object in a specific format 



5 



182. A method according to claim 181, wherein said format comprise 



at least one of the following: 

jpeg image; gif image; Word document format; Lotus notes format; 
mpeg format; text format; rich text format; Unicode text format; multi byte text 
encoding format; formatted text format; ASCII text format; HTML; XML; 
10 PDF; postscript; MS-Excel spreadsheet; MS-Excel drawing; MS-Visio drawing; 
Photoshop drawing; AutoCAD drawing format; and CAD drawing format 

183. A method according to claim 179, wherein said elementary 
information unit identifiers are determined by the content of said elementary 

1 5 information units which they are assigned to. 

1 84. A method according to claim 1 83, wherein said elementary 
information unit identifiers are solely determined by said content 

20 185. A method according to claim 179, wherein said elementary 

information units identifiers are at least partly determined by locations within an 
information object of respective elementary information units to which they are 
assigned. 

25 186. A method according to claim 179, wherein said elementary 

information units identifiers are at least partly determined by the content of an 
elementary information unit in proximity to said elementary information units 
to which they are assigned. 

30 1 87. A method according to claim 179, comprising storing said 

elementary information units identifiers in a database. 
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188. A method according to claim 187, further comprising using said 
elementary information units identifiers stored in said database for identifying 
at least one further, unidentified, information object 

5 189. A method according to claim 187, further comprising using said 

elementary information units identifiers stored in said database for comparing 
information objects. 



190. A method according to claim 179, comprising storing only some 
10 of said elementary information units identifiers in a database. 

191. A method according to claim 190, wherein said storing of only 
some of said elementary information units identifiers in a database is to achieve 
at least one of the following: 

15 reduce storage cost; 

increase efficiency of assigning of said elementary information units 
identifiers to said elementary information units by only performing said 
assignment for.elementary information units identifiers that are stored in said 
database; and 

20 increase the efficiency of searching for said elementary information 

units identifiers in said database. 



192. A method according to claim 190, wherein said storage of only 
some of said elementary information units identifiers in a database is done in a 

25 manner that ensures that any area of a given size in said information object 
contains a predetermined minimum number of said stored elementary 
information units. 

193. A method according to claim 192, wherein said given size is 
30 dependent on properties of a respective information object. 

194. A method according to claim 1 93, wherein said properties of 
said information object comprise at least one of the following: 

importance; size; confidentiality level; and format 
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195. A method according to claim 192, wherein said minimum 
number is dependent on properties of said information object 

5 196. A method according to claim 195, wherein said properties of said 

information object comprise at least one of the following: 
importance; size; confidentiality level; and format 

197. A method according to claim 177, wherein said information 
10 objects comprise at least one compound information object, said compound 

information object comprising at least one of the following: 

a simple information object; a compound information object; an ordered 
set of compound information objects; an ordered set of simple information 
objects; and an ordered set of compound and simple information objects. 

15 

198. A method according to claim 1 76, wherein said information 
comprises at least one of the following: 

numeric data; spreadsheet data; numeric spreadsheet data; textual 
spreadsheet data; word processor data; textual data; hyper text data; audio data; 
20 visual data; multimedia data; binary data; raw data; database data; video data; 
drawing data; chart data; picture data; and image data. 

199. A method according to claim 176, comprising using said 
deducing to attach to said information object an information object policy, said 

25 policy comprising at least one of the following: 

an allowed distribution of said information object; 

a restriction on distribution of said information object; 

an allowed storage of said information object; 

a restriction on storage of said information object; 
30 an action to be taken as a reaction to an event; 

an allowed usage of said information object; and 

a restriction on usage of said information object 



£1 
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200. A method according to claim 199, wherein said information 
object policy comprises at least one action to be taken as a reaction to an event, 
and wherein said action comprises at least one of the following: 

preventing distribution of said information object; preventing storage of 
5 said information object; preventing usage of said information object; reporting 
distribution of said information object; reporting storage of said information 
object; reporting usage of said information object; reporting; alerting about 
distribution of said Information object; alerting storage of said information 
object; alerting usage of said information object; alerting; logging distribution 
10 of said information object; logging storage of said information object; logging 
usage of said information object; logging; notifying about distribution of said 
information object; notifying about storage of said information object; notifying 
about usage of said information object; notifying; notifying to an administrator, 
notifying to a manager; notifying to a recipient; notifying to a sender; notifying 
15 to an owner of said information object; quarantine; alerting an administrator; 
alerting a manager, alerting a recipient; alerting a sender; alerting an owner of 
said information object; reporting to an administrator, reporting to a manager; 
reporting to a recipient; reporting to a sender; reporting to an owner of said 
information object; encrypting said information object; changing said 
20 information object; replacing said information object; and utilizing digital 
rights management technology on said information object 

201. A method according to claim 199, wherein said information 
object policy comprises at least one action to be taken as a reaction to an event, 

25 and wherein said event comprises at least one of the following: 

attempted distribution of said information object; attempted storage of 
said information object; 

attempted usage of said information object; distribution of said 
information object; storage of said information object; and usage of said 
30 information object 

202. A method according to claim 1 99, wherein said information 
object usage comprises at least one of the following: 
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copying an excerpt; editing; copying to clipboard; copying an excerpt to 
clipboard; changing format; changing encoding; encryption; decryption; 
ranging digital management; opening by an application; and printing. 

5 203. A method according to claim 199, wherein said information 

object policy comprises placing a substantially imperceptible marking in said 
information object, said marking comprising information content, and said 
method comprising placing said marking, when indicated by said policy, before 
allowing at least one of the following: 

10 storage of said information object; usage of said information object; and 

distribution of said information object 

204. A method according to claim 203, wherein said information 
content for storage in said marking comprises at least one of the following: 
15 the identity of said information object; 

the identity of a user performing the action in respect to said information 

object; 

the identity of a user authorizing the action in respect to said information 

object; 

20 the identity of a user overriding policy and approving the action in 

respect to said information object; and 

the identity of a user requesting the action in respect to said information 

object. 

25 205. A method according to claim 199, wherein said information 

object policy further comprises changing said information object by at least one 
of the following: . 

deleting part of said information object; replacing part of said 
information object; and inserting an additional part to said information object 
30 before allowing at least one of the following actions: 

storage of said information object; usage of said information object; and 
distribution of said information object 
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206, A method according to claim 205, wherein said changing of said 
information object is done in order to eliminate parts having policies that do not 
allow for said action to be executed while they are in the document 

5 207. A method according to claim 205, wherein said changing of said 

information object is carried out in order to personalize said information object 

208. A method according to claim 205, wherein said changing of said 
information object is carried out in order to customize said information object 

10 for a specific use. 

209. A method according to claim 205, wherein said changing of said 
information object is done in a maimer selected to achieve at least one of the 
following: 

15 preserving the coherency of said information object; seamlessness; 

preserve the structure of said information object; preserving the linguistic 
coherency of said information object; preserving the formatting style of said 
information object; and preserve the pagination style of said information object. 

20 210. A method according to claim 205, wherein said information 

objects comprise compound information objects and wherein said changing of 
said information object is made to constituent parts of a compound information 
object. 

25 21 1 . A method according to claim 199, wherein said storing 

comprises storage in at least one of the following: 

a portable media device; a floppy disk; a hard drive; a portable hard 

drive; a flash card; a flash device; disk on key; magnetic tape; magnetic media; 

optic media; punched cards; a machine readable media; a CD; a DVD; a 
30 firewire device; a USB device; and a hand held computer. 
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212. A method according to claim 199, wherein said policy comprises 
distribution regulation, said distribution regulation being for regulating at least 
one of the following: 

sending said information object via mail; 
5 sending said information object via web mail; 

uploading said information object to a web server; 

uploading said information object to a FTP server; 

sending said information object via a file transfer application; 

sending said information object via an instant messaging application; 
10 sending said information object via a file transfer protocol; and 

sending said information object via an instant messaging protocol. 



213. A method according to claim 1 99, wherein said policy is 
dependent on at least one of the following: 

1 5 the domain of a respective information obj ect; the identity of a system; 

the identity of a user; the identity level of a user authorizing an action; the 
identity of a user requesting an action; the identity of a user involved in an 
action; the identity of a user receiving an information object; the authentication 
level of a system; the authentication level of a user; the authentication level of a 

20 user requesting an action; the authentication level of a user authorizing an 
action; the authentication level of a user involved in an action; the 
authentication level of a user receiving said information object; the 
authentication level of a user sending said information object; the format of an 
information object instance; an interface being used; an application being used; 

25 encryption being used; digital rights management technology being used; 

detection of transformation, wherein said transformation is operable to reduce 
the ability to identify said transformed information object; information object 
integrity; regular usage pattern; regular distribution pattern; regular storage 
pattern; information path; consistency of an action with usage pattern; the 

30 identity of a user overriding policy and authorizing the action in respect to said 
information object; the authentication level of a user overriding policy and 
authorizing the action in respect to said information object; the identity of a user 
sending information object; information property of said information object; 
language of said information object; representation of said information object; 
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operations done on of said information object; identity of users involved along 
the life cycle of said information object; application used on of said information 
object; transition channel of said information object; participant agents; virtual 
location of a computer; logical location of a computer; physical location of a 
5 computer; type of a computer; type of a laptop computer; type of a desktop 
computer; type of a server computer; and owner identity- 

214. A method according to claim 199, further comprising enabling at 
least one user to override at least one of decisions contained within said policy. 

10 

215. A method according to claim 176, wherein said deducing 
comprises utilizing conditional probabilities for at least one of the following: 

identification of information objects; 
classification of information objects; and 
1 5 identification of a knowledge domain of information objects. 

216. A method according to claim 1 99, wherein at least part of said 
policy is stored in a database. 

20 2 1 7. A method according to claim 1 76, wherein said deducing further 

comprising utilizing keywords for at least one of the following: 

identification of information objects; identification of elementary 
information units; classification of information objects; and identification of the 
domain of information objects. 

25 

218. A method according to claim 217, wherein said keywords are 
stored in a database. 

219. A method according to claim 217, wherein said keywords are 
30 stored in at least one of the following forms: 

hash value; raw string; and numeric representation. 

220. A method according to claim 199, wherein at least part of said 
policy is defined in terms of a logic expression. 
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22 1 . A method according to claim 220, wherein said expression is 
evaluated by lazy evaluation- 

5 222. A method according to claim 220, wherein at least some of the 

variables in said logic expression comprise of at least on of the following: 

an external function; an external function based on group membership; 
and an external variable. 

10 223. A method according to claim 199, wherein at least part of said 

policy is defined in terms of any one of a group comprising rules, imposed 
restrictions, granted privileges, reaction to one or more given events, group 
operations, a property of said information object, a property of a user, a property 
of a computer, a property of an entity, and a hierarchy of calculations. 



15 



20 



25 



30 



224. A method according to claim 1 99, wherein at least part of said 
policy is defined in terms of a role, wherein said role consists of a property of at 
least one of a user and a system and wherein said role further comprises at least 
one authorization. 

225. A method according to claim 199, wherein at least part of said 
policy is defined in terms of at least one of the following languages: 

a scripting language; an ordered calculation language; a programming . 
language; an interpreted language; and a functional language. 

226. A method according to claim 225, wherein said at least one of 
said following languages comprises instructions for the operation of an ordered 
calculation resulting in at least one of the following: 

policy; instruction to perform an action; restriction; and allowance. 

227. A method according to claim 199, wherein said information 
object is a compound information object comprising constituent simple 
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information objects, and a respective policy assigned to said information object 
comprises different policies for at least some of said constituent information 
objects. 

5 228. A method according to claim 176, wherein at least one user is 

defined in an owner definition as an owner of said information object 

229. A method according to claim 228, wherein said owner definition 
is stored in a database. 

10 

230. A method according to claim 1 76, wherein said deducing further 
comprises utilizing organizational structure information. 

23 1 . A method according to claim 230, wherein said organizational 
1 5 structure information comprise at least one of the following: 

user superiority; working groups; organizational hierarchy; departmental 
separation; and membership in working groups. 

232. A method according to claim 199, comprising using 

20 organizational structure information in order to assign a respective policy 
object. 

233. A method according to claim 230, wherein at least part of said 
organizational structure information is stored in a database. 

25 

234. A method according to claim 230, wherein at least part of said 
organizational structure information is used for information object 
classification. 

30 235. A method according to claim 230, wherein at least part of said 

organizational structure information is imported from at least one of the 
following: 

organizational data system; data management system; organizational 
data management system; 
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knowledge management system; user directory; LDAP server; 
document; and an organizational chart 

236. A method according to claim 1 76, further comprising making 
5 use of at least one user interface operable to assist in at least one of the 

following: 

classification; policy definition; template definition; approving and 
revising automatic template definition; importing organizational structure 
information; revising organizational structure information; produce reports; 
10 overriding policy decisions; and providing authorizations. 

237. A method according to claim 199, comprising defining an 
information class as a group consisting of at least two information objects, said 
defining further comprising associating with said information class a 

15 corresponding class policy being a policy shared by said information objects. 

238. A method according to claim 237, wherein said information class 
policy comprises at least a part of respective policies of said information objects 
within said class. 

20 

239. A method according to claim 176, further comprising using 
template information objects to represent commonly repeated information, such 
that a template information object together with a difference information object 
representing instance specific information are together formable to produce a 

25 compound information object in which common and specific information are 
respectively identifiable. 

240. A method according to claim 239, comprising using said 
template information object in identifying any of unknown information object 

30 comprising information corresponding to said template information object. 

241 . A method according to claim 239, wherein said template 
information object is a compound information object, wherein said template 
information object comprises at least one placeholder, and wherein said method 
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comprises replacing said placeholder by at least part of said difference 
information object when said difference information object and a respective 
template information object are combined. 

5 242. A method according to claim 241, wherein at least one of said 

placeholders is a specialized placeholder, said specialized placeholder 
comprising specialization information to identify a respective specialization of 
said specialized placeholder. 

10 243. A method according to claim 241 , wherein at least one of said 

placeholders is a specialized placeholder, said specialized placeholder 
comprising a restriction about information objects permitted for replacing said 
specialized placeholder. 

1 5 244. A method according to claim 239, wherein said template 

information object comprises at least one of a group comprising: a disclaimer; a 
form; a header; a footer; a contract; and an invoice. 

245. A method according to claim 242, wherein said specialized 
20 placeholder comprises a restriction about information objects permitted for 

replacing said specialized placeholder, and wherein said restriction comprises a 
rule for excluding at least one of the following: 

an object comprising numeric information; 

an object comprising a word; 
25 an object comprising a character; 

an object comprising a digit; 

an object comprising a sentence; and 

an object comprising a simple information object 

30 246. A method according to claim 239, comprising defining a 

template information object and wherein said defining comprises automatically 
identifying a template information object candidate. 
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247. A method according to claim 246, wherein said automatically 
identifying a template information object candidate comprises identification of 
shared elementary information units of at least two information objects. 

5 248. A method according to claim 246, wherein said step of 

automatically identifying a template information object candidate comprises 
identification of substantially similar information objects. 

249. A method according to claim 246, wherein said step of 

10 automatically identifying a template information object candidate comprises the 
use of at least one of text parsing; and text matching. 

250. A method according to claim 239, comprising deriving at least a 
part of a respective information object policy associated with a template 

1 5 instance information object from an information object policy of the respective 
originating template information object. 

251. A method according to claim 1 99, wherein at least a part of an 
information object policy of a respective information object is derived from a 

20 default information object policy when said part of said information object 
policy of said information object is not explicitly defined. 

252. A method according to claim 179, comprising applying 
preprocessing to said elementary information units before assigning identifiers 

25 thereto. 

253 . A method according to claim 252, wherein said preprocessing is 
done in order to enhance at least one of efficiency and robustness. 

30 254. A method according to claim 252, wherein said preprocessing 

comprises at least one of canonization; removal of common words; removal of 
words not having a substantial effect on the meaning of the text; removal of 
punctuation; correction of spelling; canonization of spelling; scene detection; 
canonizing size; canonizing orientation; canonizing color; removing color; 
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reducing noise; enhancing area separation; enhancing borders; enhancing lines; 
sharpening; blurring; removal of elementary information units substantially 
similar to neighboring elementary information units; canonization of grammar; 
and transformation to a phonetic representation. 

5 

255. A method according to claim 252, comprising carrying out said 
preprocessing so as to ensure that any area of a given size in said information 
object contains at least a predetermined number of said elementary information 
units having an assigned elementary information unit identifier. 

10 

256. A method according to claim 255, wherein said given size is 
dependent on properties of said information object 

257. A method according to claim 256, wherein said properties of said 
1 5 information object comprise at least one of a group comprising: importance; 

size; confidentiality level; and format 

258. A method according to claim 255, wherein said predetermined 
number is dependent on properties of said information object. 

20 

259. A method according to claim 258, wherein said properties of said 
information object comprise at least one of a group comprising: importance; 
size; confidentiality level; and format 

25 260. A method according to claim 1 76, further comprising a stage of 

detection of information objects having undergone transformations. 

261 . A method according to claim 260, wherein said stage of detection 
is aimed for detection of transformations intended to reduce the ability to 
30 identify said information object 
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262. A method according to claim 260, wherein said detection of 
information objects that have undergone transformation comprises detection of 
at least one of a group comprising: 

transformation artifacts; spelling mistakes; wrong grammar; wrong 

5 punctuation; wrong capitalization; missing punctuation; missing capitalization; 
irregular word distribution; lack of common words; predominance of unknown 
words; inconsistent headers; headers inconsistent with file type; headers 
inconsistent with file content; file type inconsistent with file content; irregular 
distribution of characters; irregular distribution of words; irregular distribution 

10 of character sequences; irregular distribution of word sequences; irregular 

length of words; irregular length of sentences; irregular distribution of length of 
words; irregular distribution of length of sentences; irregular file format; 
irregular file encoding; unknown file format; unknown file encoding; mix of 
non-alphabetic characters; unopenable file; action time; information object 

1 5 creation time; information object update time; encryption; and an unexpectedly 
high level of entropy, 

263 . A method according to claim 1 79, comprising formulating 
respective assigned elementary information unit identifier to be resilient to 

20 small errors. 

264. A method according to claim 1 79, wherein said assigning of 
elementary information unit identifier utilizes image matching. 

25 265. A method according to claim 179, wherein said assigning of 

elementary information unit identifier comprises a mapping to a Euclidian 
space. 

266. A method according to claim 265, wherein said mapping to a 
30 Euclidian space comprises approximating a pairwise difference between 
elementary information units. 
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267. A method according to claim 266, wherein said approximating is 
such that a difference between two elementary information units approximates 
said pairwise difference between said two elementary information units. 



5 268, A method according to claim 266, wherein said approximation of 

said pairwise difference between elementary information units comprises an 
approximation of at least one of the following: 

semantic difference; distance measured by image matching; phonetic 
difference; and spelling difference. 

10 

269. A method according to claim 176, wherein said information 
object is a knowledge object 

270. A method according to claim 176, wherein said elementary 
information unit is an elementary fact. 

271. A method according to claim 270, wherein said elementary fact 
comprises at least one of the following: 

sentence; database entry; representation independent description of 
knowledge; modular description of knowledge; and abstract description of 
knowledge. 

272. A method according to claim 237, wherein said information class 
is a knowledge class. 

25 

273. A method according to claim 176, further comprising a stage of 
discerning lifecycle information about a respective information object 

274. A method according to claim 273, wherein said discerning of 
30 information about the lifecycle of said information object comprises utilizing 

information about sharing of at least one elementary information unit in said 
information object, wherein said elementary information .unit is shared with at 
least one additional information object 
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275. A method according to claim 273, wherein said discerning of 
information about the lifecycle of said information object is based on at least 
one of a group comprising: file system date information; information about 
editing of said information object; and information about registration of said 

5 information object 

276. A method according to claim 273, comprising utilizing said 
information about the lifecycle of said information object for the creation of a 
lifecycle graph. 

10 

277. A method according to claim 273, comprising utilizing said 
information about the lifecycle of said information object to define at least part 
of the policy of said information object said utilizing comprising identifying at 
least one other information object along said information object's lifecycle and 

1 5 examining a policy associated therewith. 

278. A method according to claim 179, wherein said assigning of said 
elementary information unit identifier is carried out a plurality of times, each 
time utilizing a different method for assigning of an elementary information unit 

20 identifier. 

279. A method according to claim 278, wherein said assigning of 
elementary information unit identifier several times comprises storing of 
elementary information unit identifier assigned utilizing different methods are 

25 stored separately. 

280. A method according to claim 278, wherein said assigning of said 
elementary information unit identifier several times comprises storing of 
elementary information unit identifier assigned utilizing different methods can 

30 be distinguished according to said method utilized to assign them. 

28 1 . A method according to claim 278, wherein said different 
methods are selected such as to optimize between at least any two of the 
following: 
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storage space; search speed; capability to detect transformation; 
capability to detect a specific transformation; resilience to transformation; 
resolution of identification from among similar information objects; resolution 
of identification of boundaries within compound information objects; resilience 
5 to a specific transformation; and resilience to transformation. 

282. A method according to claim 179, wherein said assigning of a 
respective elementary information unit identifier comprises utilizing a method 
having at least one of the following characteristics: 

10 order sensitive to data in the elementary information unit; order 

insensitive in the elementary information unit; utilizing changing definitions of 
the elementary information unit such that said assigning of said elementary 
information unit identifier is carried out a plurality of times using a plurality of 
definitions; utilizing an exchangeable method of preprocessing, such that said 

1 5 assigning of said elementary information unit identifier is carried out several 
times; being omission resilient; being insertion resilient; being replacement 
resilient; being dictionary based; being distribution based; being locality based; 
being histogram based; and being n-gram based. 

20 283. A method according to claim 199, wherein an information object 

policy comprises at least some information about one or more methods utilized 
for assigning of an elementary information unit identifier to a respective 
information object 

25 284. A method according to claim 278, wherein said assigning 

utilizing different methods comprises utilizing said different methods 
sequentially until a predetermined stop condition is reached 

285* A method according to claim 179, wherein said information 
30 object comprises spreadsheet data, and wherein said assigning of said 

elementary information unit identifier assigned to said information object 
comprises utilizing a method comprising at least one of the following 
characteristics: 
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invariance to linear transformation; invariance to reordering; invariance 
to permutation; resilience to linear transformation; resilience to reordering; 
resilience to permutation; resilience to minor changes; resilience to cuts; 
utilizing of statistic moment; utilizing of statistic moment for a table; uti lizin g 
5 statistic moment for a row; utilizing statistic moment for a column; and utilizing 
a mathematical descriptor of the information object data. 

286. A method according to claim 179, comprising utilizing said 
elementary information unit identifiers for said information object identification 

10 using a technique having at least one of the following characteristics: omission 
resilience; insertion resilience; replacement resilience; being dictionary based; 
being distribution based; being locality based; being based on the size of 
elementary information units; being based on the size of information objects; 
resilience to linear transformation; resilience to reordering; resilience to 

1 5 permutation; resilience to minor changes; resilience to cuts; being histogram 
based; and being n-gram based. 

287. A method according to claim 176, further comprising utilizing a 

client 

288. A method according to claim 287, wherein said client comprises 
at least one of the following: 

end point software; end point hardware; tamper resistant software; 
tamper resistant hardware; . 

client side software; and client side hardware. 

289. A method according to claim 287, comprising utilizing said 
client for at least one of the following: 

monitoring of client side storage; monitoring of client side access; 
30 monitoring of client side usage; 

monitoring of client side distribution; 
monitoring of copying of information object excerpts; 
monitoring of clipboard; 
monitoring of at least one application; 
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5 



monitoring of at least one interface; 
control of at least one application; 
control of at least one interface; 
control of clipboard; 

control of copying of information object excerpts; 
control of client side storage; 
control of client side access; 
control of client side usage; and 
control of client side distribution. 



10 



290. A method according to claim 176, comprising utilizing 
comparing of at least two information objects to calculate pairwise similarity 



pairwise similarity to map said information objects to a space. 

292. A method according to claim 29 1 , wherein said space is an 
Euclidian space, and wherein the closeness between any two objects within said 

20 Euclidian space is approximately proportional to said pairwise similarity 
between said information objects. 

293. A method according to claim 291, wherein said space is a 
weighted graph, and wherein the weight of an edge between any two objects 

25 within said graph space is approximately proportional to said pairwise similarity 
between said information objects. 

294. A method according to claim 29 1 , wherein said space is a graph, 
and wherein the existence of an edge between any two objects within said graph 

30 space is dependent on said pairwise similarity between said information objects. 

295. A method according to claim 291, wherein said space is utilized 
to identify at least one similarity information class, wherein said information 
class consists of at least two information objects, wherein said information class 



between objects. 



15 



291 . A method according to claim 290, comprising utilizing said 
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policy is a policy shared by the information class, and wherein said similarity 
information class is bounded within said space. 



296, A method according to claim 29 1 , comprising utilizing said 
5 space to identify at last one information object substantially similar to an 
unidentified information object 



297. A method according to claim 291, comprising using said space to 
identify at least one other information object substantially similar to an 

1 0 information object for which policy is not known, thereby to obtain a policy 

associated with said other information object to use as basis for a policy for said 
information object. 

298. A method according to claim 176, comprising storing 
1 5 information about said information object in a database. 

299. A method according to claim 176, further comprising extracting 
a descriptor of said information object, based on statistical analysis of said 
information object 

20 

300. A method according to claim 176, comprising storing the order 
of said elementary information units within said information object in a 
database. 

25 301 . A method according to claim 300, comprising using said order 

for identification of said information object 

302. A method according to claim 176, further comprising interfacing 
at least one of an information management system; and a document 

30 management system. 

303. A method according to claim 176, further comprising tracking at 
least one of the following: 

usage patterns; storage patterns; and distribution patterns. 
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304. A method according to claim 303, wherein said tracking is 
carried out to infer information about at least one of the following: 

normal usage patterns; normal storage patterns; normal distribution 
5 patterns; irregular usage patterns; irregular storage patterns; and irregular 
distribution patterns. 

305. A method according to claim 304, wherein said inferred 
information is used to define at least part of a policy. 

10 

306. A method according to claim 304, comprising using said inferred 
information for information object classification. 

307. A method according to claim 176, further comprising logging. 

15 

308. A method according to claim 307, wherein said logging 
comprising logging of at least one of the following: 

actions; events; and information objects identification. 

20 309. A method according to claim 307, wherein at least part of said 

logging is controlled by a policy. 

310. A method according to claim 307, wherein at least part of said 
logging is stored in a database. 

25 

311. A jnethod according to claim 307, comprising utilizing said 
logging to augment lifecycle information for said information object 

3 12. A method according to claim 176, further comprising assessing 
30 the integrity of at least one information object, wherein said integrity 

assessment consists of comparing said information object with a version of said 
information object for which integrity is assured. 
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313. A method according to claim 3 12, further comprising issuing a 
certificate of said integrity for at least one information object. 



314. A method according to claim 3 13, wherein said certificate is a 
5 cryptographic certificate. 



315. A method according to claim 312, further comprising replacing 
said information object with said version of said information object for which 
said integrity is assured. 

10 

316. A method according to claim 312, comprising identifying when 
said integrity of said information object is not satisfactory, and in such a case 
not allowing distribution of said information object 

15 317. A method according to claim 312, comprising identifying when 

said integrity of said information object is not satisfactory, and in such a case 
not allowing storage of said information object. 



3 1 8. A method according to claim 312, comprising identifying when 
20 said integrity of said information object is not satisfactory, and in such a case 

not allowing usage of said information object 

319. A method according to claim 176, further comprising defining at 
least one constituent information object to be an ignored information object, and 

25 wherein, whenever said to be ignored information object is an element of a 

compound information, ignoring said object in identification of said compound 
information object 

320. A method according to claim 1 99, further comprising changing 
30 access control information in accordance with said policy. 

321 . A method according to claim 176, further comprising not 
allowing usage of respective ones of said information objects outside an 
organization. 
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322. A method according to claim 176, further comprising not 
allowing storage of respective ones of said information object outside an 
organization. 

.5 

323. A method according to claim 176, further comprising not 
allowing distribution of respective ones of said information object outside an 
organization. 

10 324. A method according to claim 20 1 , wherein said policy comprises 

at least one mandatory lifecycle. 

325. A method according to claim 324, wherein said action is 
dependent on the matching of said mandatory lifecycle with a lifecycle of a 

15 respective event. 

326. A method according to claim 324, wherein said mandatory 
lifecycle comprises at least one mandatory recipient of said information object; 
and an order of events concerning said information object 

20 

327. A method according to claim 205, wherein said inserting an 
additional part to said information object comprises inserting at least one of the 
following: a header; a footer; and a disclaimer. 

25 328. A method according to claim 199 or claim 213, further 

comprising defining areas and wherein said policy is dependent on whether an 
action is taken inside a given defined area. 

329. A method according to claim 199 or claim 21 3, further 
30 comprising defining areas and wherein said policy is dependent on whether an 
event occurs inside a given defined area. 
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330. A method according to claim 176 or claim 179, comprising using 
said deducing to locate at least one information object with similar content to a 
given information object 

5 33 1 . A method according to claim 199, comprising attaching a 

respective policy to information objects according to their logical location 
within an information storage medium. 

332. A method according to claim 33 1 , further comprising utilizing a 
1 0 crawler for automatic location of information obj ects within said information 

storage medium. 

333. A method according to claim 332, wherein said information 
storage medium 

15 is a file system. 

334. A method according to claim 330, wherein said locating is done 
in an information storage medium. 

20 335. A method according to claim 334, further comprising utilizing a 

crawler for automatic location of information objects within said information 
storage medium. 



336. A method according to claim 334, wherein said information storage 
25 medium comprises at least one file system. 

337. A method for automated computerized exchange of information 
within an information object having overall coherency, the method comprising 
selecting amongst and carrying out at least one of the following: 

30 deleting part of said information; 

replacing part of said information; and 
inserting an additional part to said information, 
wherein said carrying out additionally comprises preservation of the 
coherency of said information within said information object. 
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338. A method according to claim 337, wherein said changing of said 
information is done in order to eliminate parts having policies that do not allow 
for at least one action to be executed while they are in the document 

5 339. A method according to claim 337, wherein said changing of said 

information is carried out in order to personalize said information. 

340. A method according to claim 337, wherein said changing of said 
information is carried out in order to customize said information for a specific 

10 use. 

341. A method according to claim 337, wherein said preserving said 
coherency comprises at least one of: 

maintaining seamlessness; preserving the structure of said information; 
15 preserving the linguistic coherency of said information; preserving the 

formatting style of said information; and preserve the pagination style of said 
information. 

342. A method according to claim 337, wherein said information 
20 objects comprise compound information objects and wherein said changing of 

said information object is made to constituent parts of a compound information 
object 

343. A method according to claim 337, carried out over a network 
25 having users with different access rights to said information object, said 

selecting and carrying out being to adapt said information object to conform to 
access rights of a one of said users to whom said information object is released. 

344. Apparatus for automatic information identification to enforce an 
30 information management policy on information objects, the apparatus 

comprising: 

a scanning module for finding elementary information units within said 
information object; and 
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a deduction module for deducing information about the identity of said 
information object from identification of said elementary information units 
found within said information object, said deduced identity being usable to 
obtain a corresponding policy rule for applying to said information object. 

5 

345. Apparatus according to claim 344, wherein said information 
objects comprise at least one simple information object, said simple information 
object comprising one of the following: 

an elementary information unit; 
10 a set of elementary information units; and 

an ordered set of elementary information units. 

346. Apparatus according to claim 344, wherein said elementary 
information units comprise at least one of the following: 

15 a sentence; a sequences of words; a word; a sequence of characters; a 

character; a sequence of numbers; a number; a sequence of digits; a digit; a 
vector; a curve; a pixel; a block of pixels; an audio frame; a musical note; a 
musical bar; a visual object; a sequence of video frames; a sequence of musical 
notes; a sequence of musical bars; and a video frame. 



20 



25 



347. Apparatus according to claim 344, wherein said deduction 
module is further configured to assign elementary information unit identifiers to 
elementary information units after identification. 



348. Apparatus according to claim 347, wherein said deduction 
module is further configured to utilize said elementary information unit 
identifiers in said deducing. 

30 349. Apparatus according to claim 344, wherein said information 

object identification is carried out on an instance of said information object, said 
information object instance being said information object in a specific format. 
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350. Apparatus according to claim 347, wherein said deduction 
module is configured to provide said elementary information unit identifiers in a 
manner determined at least partly by the content of said elementary information 
5 units which they are assigned to. 



35 1 . Apparatus according to claim 350, wherein said elementary 
information unit identifiers are solely determined by said content. 

10 352. Apparatus according to claim 347, wherein said deduction 

module is configured to provide said elementary information units identifiers in 
a manner at least partly determined by locations within an information object of 
respective elementary information units to which they are assigned. 

15 353. Apparatus according to claim 344 further comprising a policy 

attachment unit associated with said deduction module, said policy attachment 
unit being configured to use said deducing to attach to said information object 
an information object policy, said policy comprising at least one of the 
following: 

20 an allowed distribution of said information object; 

a restriction on distribution of said information object; 

an allowed storage of said information object; 

a restriction on storage of said information object; 

an action to be taken as a reaction to an event; 
25 an allowed usage of said information object; and 

a restriction on usage of said information object 

354. Apparatus according to claim 344, wherein said deducing 
comprises utilizing conditional probabilities for at least one of the following: 
30 identification of information objects; 

classification of information objects; and 

identification of a knowledge domain of information objects. 
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355. Apparatus for automated computerized exchange of information 
within an information object having overall coherency, the apparatus 
comprising a selector for selecting amongst at least one of the following data 
modifications: 
5 a deletion of part of said information; 

a replacement of part of said information; and 

an insertion of an additional part to said information, 

the apparatus further comprising a data modification unit associated with 
said selector for carrying out said selected modification within said information 
10 object, said data modification unit being associated with a coherency retention 
module for detecting coherency features of said information object and altering 
said modification in order to preserve said detected coherency features within 
said information object 

15 356. Apparatus for automatic information identification of 

information objects, the apparatus comprising: 

a scanning module for finding elementary information units within said 
information object; and 

a deduction module for deducing information about the identity of said 

20 information object from identification of said elementary information units 
found within said information object, said deduced identity being usable for 
controlling use of said information object 

357. Apparatus according to claim 356, wherein said deduction 
25 module is further configured to assign elementary information unit identifiers to 
elementary information units after identification. 
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STATEMENT UNDER ARTICLE 19 (1) 

The amendments do not add matter to the application since they are based on 
wording already present in the summary of the invention. 
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